Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

Antoine Leca Thu, 09 Dec 2004 04:11:21 -0800

On Monday, December 6th, 2004 20:52Z John Cowan va escriure:

> Doug Ewell scripsit:
>
>>> Now suppose you have a UNIX filesystem, containing filenames in a
>>> legacy encoding (possibly even more than one). If one wants to
>>> switch to UTF-8 filenames, what is one supposed to do? Convert all
>>> filenames to UTF-8?
>>
>> Well, yes.  Doesn't the file system dictate what encoding it uses for
>> file names?  How would it interpret file names with "unknown"
>> characters from a legacy encoding?  How would they be handled in a
>> directory search?
>
> Windows filesystems do know what encoding they use.


Err, not really. MS-DOS *need to know* the encoding to use, a bit like a
*nix application that displays filenames need to know the encoding to use
the correct set of glyphs (but constrainst are much more heavy.) Also
Windows NT Unicode applications know it, because it can't be changed :-).

But when it comes to other Windows applications (still the more common) that
happen to operate in 'Ansi' mode, they are subject to the hazard of codepage
translations. Even if Windows 'knows' the encoding used for the filesystem
(as when it uses NTFS or Joliet, or VFAT on NT kernels; in the other cases
it does not even know it, much like with *nix kernels), the only usable set
is the _intersection_ of the set used to write and the set used to read;
that is, usually, it is restricted to US ASCII, very much like the usable
set in *nix cases...


Antoine

Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

Reply via email to