I think he's saying he wants to convert to NFC *from* Mac OS X data, in which case the fact that Mac OS X's file system normalization is not strict NFD doesn't really matter. Also, he says he's running on Solaris, which would make it a tad difficult to call a Mac OS X API. ICU should do the trick.

It's worth pointing out that there is no such thing as "precomposed Unicode". Normalization form C (NFC) could be called "as precomposed as possible." There are some sequences of Unicode that can only be expressed using combining marks.

Deborah Goldsmith
Internationalization, Unicode liaison
Apple Computer, Inc.
[EMAIL PROTECTED]

On Nov 8, 2004, at 5:17 PM, Markus Scherer wrote:

Tay, William wrote:
Is there any C library available that converts the decomposed UTF-8 byte
streams into the pre-composed equivalent?

MacOS X does decompose filenames, but it does not use standard Unicode normalization (because it was
designed before Unicode's normalization was finalized.) I suggest you search the mailing list
archive for this list for more details. You probably need to use a MacOS system function.


ICU has options for normalization (some defined with internal constants only) which may or may not
match, or get close to, MacOS filename normalization: http://oss.software.ibm.com/cgi-bin/icu/nbrowser


markus





Reply via email to