DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7065>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7065 Xerces encodes strange characters but can't parse them Summary: Xerces encodes strange characters but can't parse them Product: Xerces-J Version: unspecified Platform: All OS/Version: All Status: NEW Severity: Normal Priority: Other Component: Core AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] This may be a failing of my understanding of XML, but I've always been a strong believer that if a framework can generate a document, it should be able to parse it as well. The following code generates an XML document that cannot be parsed by xerces. The code and output follow: Code: public static void main(String[] args) throws Exception { byte []bytes = { 28 }; //Create the document Document document = new DocumentImpl(); Element root = document.createElement("TEST"); Node child = document.createTextNode(new String(bytes)); root.appendChild(child); document.appendChild(root); //Serialize document to String ByteArrayOutputStream outStream = new ByteArrayOutputStream(); OutputFormat format = new OutputFormat(document); XMLSerializer serial = new XMLSerializer(outStream, format); serial.asDOMSerializer(); serial.serialize(document.getDocumentElement()); outStream.flush(); String xml = outStream.toString(); //Print out text interpretaion of xml document System.out.println(xml); //reparse text into xml ByteArrayInputStream inputStream = new ByteArrayInputStream(xml.getBytes ()); DOMParser parser = new DOMParser(); InputSource inputSource = new InputSource(inputStream); parser.parse(inputSource); document = parser.getDocument(); } Output: <?xml version="1.0" encoding="UTF-8"?> <TEST></TEST> [Fatal Error] :2:13: Character reference "c" is an invalid XML character. org.xml.sax.SAXParseException: Character reference "c" is an invalid XML character. at org.apache.xerces.parsers.DOMParser.parse(DOMParser.java:235) at testclassloader.TestXerces.main(TestXerces.java:53) Exception in thread "main" This particular test was run with xerces 2.0.1, but I've had similar results with 1.4.4 though the outputted escaped character is different. While I realize that character 28 does not fit within the XML spec as a valid character, I am curious why xerces will generate text node or serialize a document with an invalid character. Also, is there any way to properly encode this document or do I need to manually escape my node text before encoding? Thanks for your time and for working on a fantastic open-source project. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
