I have been learning a lot about Erlang recently, and I stumbled upon a
strange behaviour: printing strings containing non-ascii characters,
e.g. "μs", using io:format and the ~ts control sequence behaves
differently depending on the way the program is run:
- In
rebar3 shell,rebar3 run, or a release running with a console:"μs". - With a release running in foreground:
"\x{3BC}s".
I first checked the shell environment, but os:getenv("LANG") correctly
returns en_US.UTF-8 in both cases. I then suspected the Erlang VM was
initialized with a different +pc flag in console mode, but adding +pc unicode to vm.args did not change anything.
Finally I ended up on the very detailed IO protocol
documentation. Printing
the IO configuration of the standard output, obtained with io:getopts(),
yields different results depending on the environment: in interactive
environments and with rebar3 run, the encoding parameter is set to
unicode, but is set to latin1 in releases running in foreground.
As it turns out, configuring the standard IO device to output UTF-8 encoded
strings is a simple as calling io:setopts([{encoding, unicode}]) when the
application starts. This is definitely something I will add to all my
applications from now on.