"Arcane Jill" <[EMAIL PROTECTED]> writes: > Unix makes is possible for /you/ to change /your/ locale - but by > your reasoning, this is an error, unless all other users do so > simultaneously.
Not necessarily: you can change the locale as long as it uses the same default encoding. By "error" I mean "a bad idea". The system does not prevent from changing the locale to a different encoding. But then you are on your own and various things can break: terminal output will be mangled, you can't enter characters used in a different encoding from the keyboard, text files will be illegible, and Unicode programs which process texts may reject your data or even filenames. If you still need to change encodings, it's safer to use ASCII-only filenames. This situation is temporary. Well, it may last 10 more years or so, but it will probably gradually improve: First, more protocols and file formats are becoming aware of character encodings and either label them explicitly or use a known encoding (generally some Unicode encoding scheme). Especially protocols for data interchange over Internet: WWW, email, usenet, modern instant messaging protocols like Jabber. Some old protocols remain encoding-ignorant, e.g. irc and finger. GNOME 1 used the locale encoding, GNOME 2 uses UTF-8. Copying & pasting text in X window now has a separate API which uses UTF-8. While the irc protocol doesn't specify the encoding, the irssi client can now recode texts itself to conform to customs of particular channels. Second, UTF-8 is becoming more usable as the default encoding specified by the locale. I don't use it now because too many things still break, but it's improving: there are things which didn't work just a few years ago and work now. Terminal emulators in X widely support UTF-8 mode now. The curses library now has a working wide character API. Emacs and vi work in UTF-8 (Emacs still has problems). Readline now works in UTF-8. Localized messages (gettext) are now recoded automatically. Other programs still don't work. Bash works, while zsh and ksh don't. Most full-screen text programs use the narrow character curses API and don't work in UTF-8. Brokenness of interactive interpreters of various languages vary. BTW, in the wide character curses API, the only way curses can work in a UTF-8 terminal, characters are expressed as sequences of wchar_t (base char + some combining chars, possibly double width). Which means that you must somehow translate filenames to this representation in order to display them - same as with a Unicode-based GUI. It's meaningless to render arbitrary bytes on the terminal, and you can't force curses to emit the original byte sequences which form filenames (which would be a bad idea for control characters anyway). By legimitizing non-UTF-8 filenames in a UTF-8 system you increase problems to overcome by such applications: not only they have to show control characters somehow, but also invalid UTF-8. > But it goes beyond that. Copy a file onto a floppy disc and then > physically take that floppy disc to a different Unix machine and log > on as "guest" and insert the disc ... Will the filename look the same? Depends on the filesystem and the way it is mounted. For example if it's FAT with long filenames (which I think is the usual format for floppies even on Unix), filenames can be recoded by the kernel: you specify the encoding to present filenames in and the encoding of short names. I don't know what happens with filenames which are not expressible in the selected encoding. In this way filenames may automatically convert between systems which use different default encodings, preserving the character semantics rather than the byte representation. Of course file contents will not be converted. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/