John Cowan wrote:
> Frankly, your problem is insoluble, because you have set up 
> self-contradictory
> requirements.  Suppose you are dealing with a filesystem 
> where some names
> are to be interpreted as Latin-1 and others as Latin-2.  The 
> kernel will
> give you absolutely no help about which charset to use for 
> which names
Oh well, I did not set up the requirements. They come pretty naturally.
Everything works fine if I keep the database in UTF-8 (well, raw for UNIX)
and use UTF-16 => UTF-8 for Windows filenames (sorry, didn't mention those
so far).
The same thing should work the other way around, store Windows filenames
directly into a UTF-16 database and use UTF-8 => UTF-16 conversion for UNIX
filenames. Hoping that some day most of the data will be UTF-8 makes this
even more appealing. As for any data that is not - well, the original byte
sequence can be reconstructed and a re-conversion can be done based on
user's settings (or selection) at display time. All you need is UTF-8B
conversion instead of UTF-8.


How about another question here:

Most http servers have a functionality to display filesystem and allow
changing directory and opening files. Hmmm, marking the generated html file
as UTF-8 would be a no-no thing then, unless the server guarantees that
there are no illegal sequences in it (caused by a Latin-1 filename). Too
bad, cause I would hope I can enter a directory or open a file even if it is
not displayed correctly. With characters dropped or replaced - I have no
chance.

Suppose the characters are still there and the file was not marked as UTF-8
(works as long as all other text is in English) and I selected UTF-8 myself,
in the browser. You would say there is no way I would want to convert a
portion of the displayed text to UTF-16? Maybe I won't, maybe the system
will, when I want to copy something to the clipboard...



Lars Kristan

Reply via email to