From: "Stefan Persson" <[EMAIL PROTECTED]> > Stephane Bortzmeyer wrote: > > > I do not agree. It would mean *each* application has to normalize > > because it cannot rely on the kernel. It has huge security > > implications (two file names with the same name in NFC, so visually > > impossible to distinguish, but two different string of code points). > > Couldn't this cause problems if copying two files to a floppy on a > system NOT normalising the data (e.g. a customised kernel) with file > names that would, when normalised, be identical and then accessing the > floppy on a system that DOES normalise the data? Then the second system > might think that the two files have the same file name, and wouldn't > know which one you're referring to. > > Example: > > You make two files on system A: "e-acute" and "e combining-acute". You > move the files to system B, which supports normalising, and request file > "e-acute". System B normalises that to "e combining-acute", and might > point to the wrong file. System B thinks that the name of both files is > "e combining-acute", so even if typing "e comibining-acute" it might > sometimes return "e-acute".
You've got exactly the same problem on Windows filesystems with lettercase distinctions: despite lettercase may or may not be preserved, the filesystem internally normalize case to compare filenames, so that when there's a file named "a.txt" and you store a new file "A.TXT", you overwrite the first file, even though both "a.txt" and "A.TXT" can be retreived. What's even worse, is that when you overwrite "a.txt" with "A.TXT", the initial lettercase is kept, and if you list the directory content, you'll see "a.txt", as if your "A.TXT" was not there. I would say that normalization and other transformations of filenames can (and is) a recurrent feature of filesystems (it is even normative in the Mac HFS filesystem, which uses a normalization based on NFD). Filenames are normally intended to be read by humans and easy to type in, but they are constrained for performance or compatibility reasons so that they must remain displayable and can be entered. Linux/Unix filesystems don't have this property, and filenames on these systems are intended to be handled by softwares as exact index keys, even if a user cannot enter their names from the user interface, because it contains some controls [Haven't you seen those pesky Linux/Unix filenames containing backspaces or "clear screen" escape sequences that alter your display when just performing a simple "ls" command? In some cases, you had to use kludgy shell commands just to rename/move the bogous filename.] In some cases, accepting arbitrary strings of bytes in Unix/Linux filenames becomes a security issue, which should be fixed so that no application will create (most often by error) a bogous filename that builds a file that can't be removed under its current name, or that will break some file transfer protocols (like FTP), or hang a webserver. So for me, even if filenames can be made more user-friendly by accepting Unicode characters, they are not plain text, and inherently will contain many restrictions. A good filesystem should either be assumed to be always Unicode, or specify its character set and naming rules explicitly to applications (something that has been lacking since long in FAT filesystems until FAT32 was created with Unicode LFN support).