Stefan Persson wrote:
Stephane Bortzmeyer wrote:

I do not agree. It would mean *each* application has to normalize
because it cannot rely on the kernel. It has huge security
implications (two file names with the same name in NFC, so visually
impossible to distinguish, but two different string of code points).

Couldn't this cause problems if copying two files to a floppy on a system NOT normalising the data ...

An even bigger problem, as far as I know, is that the Unix/Linux file systems just store filenames as streams of bytes (except for 0 and the ASCII code for '/') and do not enforce any particular encoding. You just cannot rely on a filename being in UTF-8 unless you know which application generated it and how that application works.


If you want to be safe with filenames on Unix/Linux, you may need to use your own, custom normalization+encoding to map Unicode strings to ASCII. Within your system, you can then control the normalization etc. (As an example for an encoding to ASCII, you could use the one that IMAP defines for folder names - a variant of UTF-7 - because it is designed with the Unix filesystem in mind.)

markus




Reply via email to