From: "John H. Jenkins" <[EMAIL PROTECTED]> > On Aug 23, 2004, at 3:34 PM, Doug Ewell wrote: > > > Deborah Goldsmith <goldsmit at apple dot com> wrote: > > > >> FYI, by far the largest source of text in NFD (decomposed) form in > >> Mac OS X is the file system. File names are stored this way (for > >> historical reasons), so anything copied from a file name is in (a > >> slightly altered form of) NFD. > > > > "Slightly altered"? > > > > Yes, the specification for the Mac file system was frozen before NFD > had been developed by the UTC, so it isn't exactly the same. But it's > close.
Yes it is very close to NFD. The actual decompositions performed are fully listed in the documentation of the MacOS filesystems. Note that there are differences between various Mac filesystems, which where also localized into their driver (in a way quite similar to the legacy MSDOS filesystem with their unpredictable codepage: notably when reading removable medias where the codepage of the system creating that media is not stored on the support...) Actually, it was based on decompositions in Unicode 2.01. But the list of decompositions is now frozen with a specific Unicode version in the filesystem driver, for compatibility reasons. This is needed because some medias may be created later with characters from a later version of Unicode, which was still not supported in the driver of a legacy system in which the media would be used. It is even more important for networked filesystems for security reasons. Because of the same security reasons, Windows filesystems will NOT normalize Unicode filenames, which are stored as a binary vector of UTF-16 codeunits (with some of them restricted for special usage, or forbidden, notably for code-units/code-ppoints in the ASCII range that have some predefined functions, or are exclusions such as most controls), and optionally mapped to a secondary "short" 8.3 name using a local OS codepage. However, it is highly recommanded to use the NFC form when creating Unicode filenames on Windows (notably because it offers round-trip compatibility with filenames created in a Windows codepage where characters are precomposed). If you create a filename with decomposed characters in NFD form, you may not be able to open that file using the filename encoded in the Windows or OEM codepage (the filesystem will not find it, as it uses a simple one-to-one mapping from the codepage codes to Unicode codepoints in NFC form).

