DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21780>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21780

Surrogate characters mishandled by SAXPrint and SAX2Print.

           Summary: Surrogate characters mishandled by SAXPrint and
                    SAX2Print.
           Product: Xerces-C++
           Version: Nightly build (please specify the date)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Samples/Tests
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


>From local CYGWIN build from CVS head (July 21, 2003):

The SAXPrint and SAX2Print samples write supplemental characters as character 
references of their high and low surrogates. It looks like the problem might be 
in framework/XMLFormatter, as I don't see any code in there that checks for 
surrogates. If this is where the problem is, I would guess that the DOMWriter 
exhibits the same behaviour.

Here's an example...

Input to SAXPrint:

<?xml version="1.0" encoding="UTF-8"?>
<root>&#x10000;&#x10ffff;</root>

Output from SAXPrint:

<?xml version="1.0" encoding="LATIN1"?>
<root>&#xD800;&#xDC00;&#xDBFF;&#xDFFF;</root>

The surrogate characters (xD800-xDFFF) are not part of Char, and thus those 
char refs are illegal.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to