I have been learning a lot about Erlang recently, and I stumbled upon a
strange behaviour: printing strings containing non-ascii characters,
e.g. "μs"
, using io:format
and the ~ts
control sequence behaves
differently depending on the way the program is run:
- In
rebar3 shell
,rebar3 run
, or a release running with a console:"μs"
. - With a release running in foreground:
"\x{3BC}s"
.
I first checked the shell environment, but os:getenv("LANG")
correctly
returns en_US.UTF-8
in both cases. I then suspected the Erlang VM was
initialized with a different +pc
flag in console mode, but adding +pc unicode
to vm.args
did not change anything.
Finally I ended up on the very detailed IO protocol
documentation. Printing
the IO configuration of the standard output, obtained with io:getopts()
,
yields different results depending on the environment: in interactive
environments and with rebar3 run
, the encoding
parameter is set to
unicode
, but is set to latin1
in releases running in foreground.
As it turns out, configuring the standard IO device to output UTF-8 encoded
strings is a simple as calling io:setopts([{encoding, unicode}])
when the
application starts. This is definitely something I will add to all my
applications from now on.