On Tuesday 2004.12.14 12:50:43 -0000, Arcane Jill wrote: > If I have understood this correctly, filenames are not "in" a locale, they > are absolute. Users, on the other hand, are "in" a locale, and users view > filenames. The same filename can "look" different to two different users. > To user A (whose locale is Latin-1), a filename might look valid; to user B > (whose locale is UTF-8), the same filename might look invalid.
Correct. The problem will however be limited to the accented Latin characters present in ISO-8859-1 beyond the ASCII set. The basic Latin alphabet in the ASCII set at the beginning of both ISO-8859-1 and UTF-8 will appear unchanged to both users (UTF-8 user looking at Latin-1's home directory, or Latin-1 looking at UTF-8's home directory). So both users could probably guess the filename they were looking at. For example, here is a file on my local machine, a Linux box with the locale set to LANG=en_US.UTF-8: déclaration_des_droits.utf8 The accented "e" in "déclaration" appears correctly under the UTF-8 locale. I then copied this file (using scp) over to an older Sun Solaris box which I do not administer, so I have to live with the "C" POSIX locale that they have got that machine set to. Now, when I view the file names in a terminal (where the terminal emulator is set to the same locale), I see: d??claration_des_droits.utf8 The terminal, being set to interpret the legacy locale, does not know how to interpret the two bytes that are used for the UTF-8 "é". Still, I can guess that the first word should be "déclaration". The solution, as has been pointed out, is for everyone to move to UTF-8 locales. In the Linux and Unix world, this is already happening for the most part. Solaris 10 now defaults to a UTF-8 locale, at least when set to English. Both SuSE and Redhat default to UTF-8 locales for most language and script environments. And (open source) tools exist for converting file names from one encoding to another encoding on Linux and Unix systems. A group of Japanese developers is working on an NLS implementation for the BSDs like OpenBSD which are currently "stuck" with nothing but the "C" POSIX locale. I think the name of that project is "Citrus". -- Ed Trager > > Is that right, Lars? > > If so, Marcin, what exactly is the error, and whose fault is it? > > Jill > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Behalf Of Marcin 'Qrczak' Kowalczyk > > Sent: 13 December 2004 14:59 > > To: [EMAIL PROTECTED] > > Subject: Re: Roundtripping in Unicode > > Using non-UTF-8 filenames in a UTF-8 locale is IMHO an error. > > > > > >