John Cowan wrote: > Frankly, your problem is insoluble, because you have set up > self-contradictory > requirements. Suppose you are dealing with a filesystem > where some names > are to be interpreted as Latin-1 and others as Latin-2. The > kernel will > give you absolutely no help about which charset to use for > which names Oh well, I did not set up the requirements. They come pretty naturally. Everything works fine if I keep the database in UTF-8 (well, raw for UNIX) and use UTF-16 => UTF-8 for Windows filenames (sorry, didn't mention those so far). The same thing should work the other way around, store Windows filenames directly into a UTF-16 database and use UTF-8 => UTF-16 conversion for UNIX filenames. Hoping that some day most of the data will be UTF-8 makes this even more appealing. As for any data that is not - well, the original byte sequence can be reconstructed and a re-conversion can be done based on user's settings (or selection) at display time. All you need is UTF-8B conversion instead of UTF-8.
How about another question here: Most http servers have a functionality to display filesystem and allow changing directory and opening files. Hmmm, marking the generated html file as UTF-8 would be a no-no thing then, unless the server guarantees that there are no illegal sequences in it (caused by a Latin-1 filename). Too bad, cause I would hope I can enter a directory or open a file even if it is not displayed correctly. With characters dropped or replaced - I have no chance. Suppose the characters are still there and the file was not marked as UTF-8 (works as long as all other text is in English) and I selected UTF-8 myself, in the browser. You would say there is no way I would want to convert a portion of the displayed text to UTF-16? Maybe I won't, maybe the system will, when I want to copy something to the clipboard... Lars Kristan