On Tuesday 2004.12.14 12:50:43 -0000, Arcane Jill wrote:
> If I have understood this correctly, filenames are not "in" a locale, they 
> are absolute. Users, on the other hand, are "in" a locale, and users view 
> filenames. The same filename can "look" different to two different users. 
> To user A (whose locale is Latin-1), a filename might look valid; to user B 
> (whose locale is UTF-8), the same filename might look invalid.

Correct. The problem will however be limited to the accented
Latin characters present in ISO-8859-1 beyond the ASCII set.  The basic Latin
alphabet in the ASCII set
at the beginning of both ISO-8859-1 and UTF-8 will appear unchanged to both 
users (UTF-8 user looking at Latin-1's home directory, or Latin-1 looking at
UTF-8's home directory).  So both users could probably guess the filename
they were looking at.  For example, here is a file on my local machine,
a Linux box with the locale set to LANG=en_US.UTF-8:

      déclaration_des_droits.utf8

The accented "e" in "déclaration" appears correctly under the UTF-8 locale.

I then copied this file (using scp) over to an older Sun Solaris box which I do 
not administer,
so I have to live with the "C" POSIX locale that they have got that machine
set to.  Now, when I
view the file names in a terminal (where the terminal emulator is set to
the same locale), I see:

      d??claration_des_droits.utf8

The terminal, being set to interpret the legacy locale, does not know 
how to interpret the two bytes that are used for the UTF-8 "é".
Still, I can guess that the first word should be "déclaration".

The solution, as has been pointed out, is for everyone to move to
UTF-8 locales.  In the Linux and Unix world, this is already happening
for the most part.  Solaris 10 now defaults to a UTF-8 locale, at least
when set to English.  Both SuSE and Redhat default to UTF-8 locales
for most language and script environments.  And (open source) tools exist for
converting file names from one encoding to another encoding on Linux
and Unix systems.  A group of Japanese developers is working on an NLS 
implementation
for the BSDs like OpenBSD which are currently "stuck" with nothing but the "C"
POSIX locale.  I think the name of that project is "Citrus".

-- Ed Trager

   

> 
> Is that right, Lars?
> 
> If so, Marcin, what exactly is the error, and whose fault is it?
> 
> Jill
> 
> -----Original Message-----
> 
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> 
> Behalf Of Marcin 'Qrczak' Kowalczyk
> 
> Sent: 13 December 2004 14:59
> 
> To: [EMAIL PROTECTED]
> 
> Subject: Re: Roundtripping in Unicode
> 
> Using non-UTF-8 filenames in a UTF-8 locale is IMHO an error.
> 
> 
> 
> 
> 
> 

Reply via email to