Unicode on Linux

Note: These instructions were tested on Red Hat Enterprise 3 but should work on any distribution with GNOME 2.3 or later.

Install Fonts and Language Support

Your Linux distribution may come with the needed fonts already installed. If not, the easiest way to get Unicode fonts on Linux is probably just to copy them from a Windows machine. Installing new fonts has always been somewhat tricky under Linux, but the following should work for most recent versions.

Copy all the .TTF files from C:\WINDOWS\FONTS to ~/.fonts. Fonts (especially those for East Asian languages) can take up a lot of space, so the ~/.fonts directory can be a symlink to somewhere else if your home directory has limited space.

Then run fc-cache ~/.fonts. After you do this, you may need to log out and log in for the new fonts to be detected.

Web Browsers

Once fonts are installed, conforming web pages should automatically display (more or less) correctly in Mozilla, Firefox or Konqueror. If you find a page that does not display correctly, try going to the View -> Encoding menu and selecting the appropriate encoding.

A very rigorous Unicode test page complete with sample output images can be found here. Don't expect everything to display correctly in any of the browsers - in fact none of the browsers I tried handled combining marks quite right. Some other test pages include various alphabets and sample texts. Changing the fonts used for various scripts can also help.

The three browsers may do better or worse than each other on different pages. Try out the test pages above and see what works best for your needs.

Terminal

gnome-terminal has the best support for various encodings of any of the terminals. To display Unicode, simply go to Terminal -> Character Coding. If the particular encoding you want to use is not displayed, choose Add or Remove and add the encoding you want to use. You may also want to change the font. To do this, go to Edit -> Profiles and click Edit when the Edit Profiles dialog is displayed. You can change the font on the General tab.

konsole also supports Unicode. It is not as flexible as gnome-terminal, though. To start a konsole with Unicode support, the LANG environment variable must be set appropriately. Try running LANG=en_US.UTF-8 konsole. See this thread on the debian-kde mailing list for a little more information on Unicode in konsole.

Recent xterms also have rudimentary Unicode support. The -u8 flag enables Unicode support; you must also select an appropriate font with the -fn flag. Try setting an alias alias xtermu='LC_CTYPE='\''en_US.UTF-8'\'' xterm -u8 -ls -fn '\''-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1'\'' -T xtermu' and then you can start a Unicode-enabled xterm with xtermu.

Editors

For simple Unicode text-editing tasks, gedit works well. For more a more functional environment, kate is a good choice. It has syntax highlighting, code folding, outlining, autoindent, and more. Copying and pasting between browsers, gnome-terminal, kate and gedit should work quite well. Even copying and pasting text displayed incorrectly by gnome-terminal to gedit or kate (which may display it correctly) should work.

mined is a text-mode Unicode editor that works very well with gnome-terminal as well as with PuTTY. It even supports using the mouse to position the cursor and select text inside the editor. It even has full support for right-to-left scripts. It also supports various East Asian encodings. mined is probably the most foolproof way to edit Unicode text on Unix.

Currently, emacs has relatively poor Unicode support - it does exist, but it's difficult and finicky to deal with. Better support for Unicode is promised for the next version of emacs after 21.4.

XEmacs's Unicode support works better for me - the simplest way to use it seems to be to do export LC_CTYPE="en_US.UTF-8" and then start XEmacs. XEmacs has problems with copying and pasting Unicode text to or from other programs, though.

Eclipse can open and display files in various encodings (see the Edit -> Encoding menu) but I was unable to copy or paste Unicode text even within Eclipse let alone from other programs.