DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12105

UTF Encoding is not preserved

           Summary: UTF Encoding is not preserved
           Product: XalanJ2
           Version: 2.4Dx
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: org.apache.xalan.serialize
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


I recently began using xalan for tranforming data that includes a few UTF 
encoded characters. My transforms go from XML to XML and I would like to 
preserve the UTF encoded chars rather than escape them, as seems to be the 
behavior in xalan (for example the UTF char of int value 146 gets encoded in 
ASCII as &#146;). When transforming using XML Spy, however, this UTF encoding 
is preserved (but I want to use xalan instead!).

I wonder if this could be considered a bug or just an implementataion decision? 
It seems, however, that if the output is meant to be encoded as UTF, why escape 
UTF chars coming from the input? 

I was able to "correct" this problem by making the following change to the code 
in org/apache/xalan/serialize/SerializerToXML.java:

In method public boolean canConvert(char ch):

Changed the line:
return bool.booleanValue() ? !Character.isISOControl(ch) : false;

To:
return bool.booleanValue() ? !Character.isUnicodeIdentifierStart(ch) || !
Character.isUnicodeIdentifierPart(ch) || !Character.isISOControl(ch) : false;

Reply via email to