I'm in the process of writing several normalization routines, and testing them against NormalizationTest.txt. The code that I use to do the composition for NFC and NFKC seems to work for every line in the test file, except for a 21 of them. An example of where my routine falls down is with the line:

1026;1026;1025 102E;1026;1025 102E; # (á; á; áâá; á; áâá; ) MYANMAR LETTER UU

According to the comment at the beginning of the file, and all that I've read elsewhere, toNFC(U+1025 U+102E) should result in U+1026. However both U+1025 and U+102E have combining classes of zero, so my code does not compose those characters. No information that I've been able to find has been able to explain this discrepancy. Any help would be greatly appreciated.



--
Clark S. Cox III
[EMAIL PROTECTED]
http://homepage.mac.com/clarkcox3/
http://homepage.mac.com/clarkcox3/blog/B1196589870/index.html

Attachment: smime.p7s
Description: S/MIME cryptographic signature



Reply via email to