15 Mar

A lesson in character encodings and Docker containers

I recently had to execute some commands on a MySQL docker container.

In particular the docker container was based on the image https://github.com/docker-library/mysql/blob/eeb0c33dfcad3db46a0dfb24c352d2a1601c7667/5.7/Dockerfile.

I executed a bash shell inside the docker container using docker exec -it mysql /bin/bash and then proceeded to enter a cli interface to MySQL:

mysql -uroot -p

The root password was a complex password generated by a password manager. It contained a £ symbol.

I had trouble pasting it in when mysql requested the password.

I decided to set a variable like so, PASS=complex-password so that I could enter mysql using:

mysql -uroot -p$PASS

However when I got to the £ symbol in the password, the shell inserted a # at the beginning of the line and then continued on the next line.

For example, if the complex password was super£pass then this would appear in the shell, if I tried to paste the password:

[email protected]:/# # PASS='super
[email protected]:/# pass'

I couldn’t quite figure out what was going on and I ended up posting a question on stackoverflow. Very fortunately some one answered my question and pointed me in the right direction.

I’ve repeated the explanation here as well as the remedy.

The crux of the issue is that the terminal I was connecting to the docker container from was encoding characters as UTF-8 whereas the bash shell inside the docker container had ASCII character encoding.

This meant when I typed a £ symbol in my terminal, the two bytes C2 A3 (The UTF-8 representation of the £ symbol) were sent into the docker container and interpreted as 2 characters.

Inside the docker container Bash was interpreting characters with the high bit set as characters with the Meta-key modifier set.

For more information on the meta-key see https://www.gnu.org/software/bash/manual/bashref.html#Introduction-and-Notation.

C2 is the decimal 194, which is equal to 128 + 66, which means Bash would interpret it as a capital B (ASCII value 66) with the meta-key modifier.

A3 is the decimal 163, which is equal to 128 + 35, which means Bash would interpret it as a # (ASCII value 35) with the meta-key modifier.

This means that Bash saw my key press of £ as the commands M-B, M-#.

Looking at https://www.gnu.org/software/bash/manual/bashref.html#Miscellaneous-Commands you will see that the
M-B moves the cursor back to the previous word and that M-# inserts the comment character at the start of the line and moves to a newline.

This explains the behaviour I was seeing but how do you go about fixing this so that I could type a £ symbol?

This is as simple as making sure the locale is set correctly in the docker container.

There was no LANG variable set, or no other locale information. This meant the default locale was being used which is the POSIX local. There were no proper locales installed. The only choice of locale with UTF-8 was C.UTF-8.

Setting the LC_ALL variable and launching bash (within bash) like so:

LC_ALL=C.UTF-8 bash

Allowed me to test this would work. Inside this bash shell within a bash shell I could now type the £ symbol freely without strange side affects.

I just had to make this change permanent.

I did this by appending the following to the file /root/.bashrc:

export LC_ALL=C.UTF-8
export LANG=C.UTF-8