Hi, Brian.
Brian Minchau/Toronto/[EMAIL PROTECTED] wrote on 2005-01-13 12:30:52 AM:
> When this class is not available it looks like it exposes a
configuration
> error in Xalan in its Encodings.properties file in the
> org.apache.xml.serializer package. It has information for the Turkish
> characters in lines like this:
> ISO8859_9 ISO-8859-9 0x00FF
> ISO8859-9 ISO-8859-9 0x00FF
> The third word on the line, 0x00FF indicates the code point of the
highest
> value used in the character set. In base 10 this value is 255. But
these
> Turkish characters are 287, 350, 304, which is bigger than 255. When
> writing the characters to the output file, the serializer thinks the
> unicode characters are out of range because they are larger than the
> supposed maximum codepoint value. So the serializer converts them to
> numerical character references, e.g. the five characters İ rather
than
> the single unicode character with a code point of 304.
>
> At this point I'm not sure what the correct maximal code point value is
for
> this character set, but I think that getting the value right might fix
your
> problem.
I think there is no single maximal code point value that will solve
this problem. The characters in ISO-8859-9 map to Unicode code points
that are discontiguous, which means that the serializer actually needs to
be able to describe ranges of code points that are representable. For
instance, the Unicode character U+00DD (LATIN CAPITAL LETTER Y WITH ACUTE)
falls into the range currently permitted by the single maximal code point
value, but it's not actually representable in ISO-8859-9 - and as you've
noted, other characters that are representable are outside the maximal
code point value currently specified.
See [1] for another example.
Thanks,
Henry
[1] http://issues.apache.org/jira/browse/XALANJ-1866
------------------------------------------------------------------
Henry Zongaro Xalan development
IBM SWS Toronto Lab T/L 969-6044; Phone +1 905 413-6044
mailto:[EMAIL PROTECTED]