It's worth pointing out that there is no such thing as "precomposed Unicode". Normalization form C (NFC) could be called "as precomposed as possible." There are some sequences of Unicode that can only be expressed using combining marks.
Deborah Goldsmith Internationalization, Unicode liaison Apple Computer, Inc. [EMAIL PROTECTED]
On Nov 8, 2004, at 5:17 PM, Markus Scherer wrote:
Tay, William wrote:Is there any C library available that converts the decomposed UTF-8 byte
streams into the pre-composed equivalent?
MacOS X does decompose filenames, but it does not use standard Unicode normalization (because it was
designed before Unicode's normalization was finalized.) I suggest you search the mailing list
archive for this list for more details. You probably need to use a MacOS system function.
ICU has options for normalization (some defined with internal constants only) which may or may not
match, or get close to, MacOS filename normalization: http://oss.software.ibm.com/cgi-bin/icu/nbrowser
markus