Does anyone happen to know what Unicode normalization form Xerces uses when reading a non-UCS character set such as ISO-8859-6 or SJIS? The issue is that some characters can be decoded to more than one different Unicode character or characters. For example, is e with accent acute é or é (ASCII e plus combining accent acute.)
Normally the difference doesn't matter, but canonical XML (and thus XML encryption) requires that Unicode normalization form C be used. Java 1.4 also appears to be deficient in documenting exactly which normalization form it actually uses. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +-----------------------+------------------------+-------------------+ | XML in a Nutshell, 2nd Edition (O'Reilly, 2002) | | http://www.cafeconleche.org/books/xian2/ | | http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.cafeconleche.org/ | +----------------------------------+---------------------------------+ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
