I came across a XML file that uses UTF-8 encoding and uses a special character. The file is well-formed according to IE and XMLSpy. But when I try to serialize it Xerces with the following program I get some output.
I am attaching the files with this mail.
On reopening the serialized file with Xerces and again trying to serialize
with my program
the file Xerces reports errors.
Exception List:
java.io.UTFDataFormatException: invalid byte 2 of 2-byte UTF-8 sequence
(0x3c)
at
org.apache.xerces.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:678)
at org.apache.xerces.impl.io.UTF8Reader.read(UTF8Reader.java:355)
at
org.apache.xerces.impl.XMLEntityManager$EntityScanner.load(XMLEntityManager.
java:3257)
at
org.apache.xerces.impl.XMLEntityManager$EntityScanner.scanContent(XMLEntityM
anager.java:2371)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(XMLDocumen
tFragmentScannerImpl.java:829)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatc
her.dispatch(XMLDocumentFragmentScannerImpl.java:1387)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocume
ntFragmentScannerImpl.java:333)
at
org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfiguration.java:524)
at
org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfiguration.java:580)
at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:152)
at
org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:110
8)
at sax2saxtest.SAXWriter.main(SAXWriter.java:34)
Exception in thread "main"
Here is the the code that I have written to parse and serialize the file
using XMLSerializer class in Xerces.
import org.apache.xml.serialize.XMLSerializer;
import org.apache.xml.serialize.OutputFormat;
import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
public class SAXWriter {
public SAXWriter() {
}
public static void main(String[] args) throws Exception {
OutputFormat of = new OutputFormat("XML","UTF-8",true);
of.setIndent(2);
FileWriter fout = new FileWriter("d:/Out1.xml");
XMLSerializer s= new XMLSerializer(new PrintWriter(fout), of);
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader rdr = sp.getXMLReader();
rdr.setContentHandler(s.asContentHandler());
String uri = "file:///d:/Temp/myfile.xml";
rdr.parse(uri);
}
}
myfile.xml
Description: Binary data
Out1.xml
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
