I came across a XML file that uses UTF-8 encoding and uses a special
character. The file is well-formed according to IE and XMLSpy.
But when I try to serialize it Xerces with the following program I get some
output. 

I am attaching the files with this mail.


On reopening the serialized file with Xerces and again trying to serialize
with my program
the file Xerces reports errors. 

Exception List:

java.io.UTFDataFormatException: invalid byte 2 of 2-byte UTF-8 sequence
(0x3c)
        at
org.apache.xerces.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:678)
        at org.apache.xerces.impl.io.UTF8Reader.read(UTF8Reader.java:355)
        at
org.apache.xerces.impl.XMLEntityManager$EntityScanner.load(XMLEntityManager.
java:3257)
        at
org.apache.xerces.impl.XMLEntityManager$EntityScanner.scanContent(XMLEntityM
anager.java:2371)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(XMLDocumen
tFragmentScannerImpl.java:829)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatc
her.dispatch(XMLDocumentFragmentScannerImpl.java:1387)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocume
ntFragmentScannerImpl.java:333)
        at
org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfiguration.java:524)
        at
org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfiguration.java:580)
        at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:152)
        at
org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:110
8)
        at sax2saxtest.SAXWriter.main(SAXWriter.java:34)
Exception in thread "main" 
  

Here is the the code that I have written to parse and serialize the file
using XMLSerializer class in Xerces. 

import org.apache.xml.serialize.XMLSerializer;
import org.apache.xml.serialize.OutputFormat;
import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
public class SAXWriter {

  public SAXWriter() {
  }
  public static void main(String[] args) throws Exception {
    OutputFormat of = new OutputFormat("XML","UTF-8",true);
    of.setIndent(2);
    FileWriter fout = new FileWriter("d:/Out1.xml");
    XMLSerializer s= new XMLSerializer(new PrintWriter(fout), of);
    SAXParserFactory spf = SAXParserFactory.newInstance();
    SAXParser sp = spf.newSAXParser();
    XMLReader rdr =  sp.getXMLReader();
    rdr.setContentHandler(s.asContentHandler());
    String uri = "file:///d:/Temp/myfile.xml";
    rdr.parse(uri);
  }
}
 





     

Attachment: myfile.xml
Description: Binary data

Attachment: Out1.xml
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to