UTF-8 character encoding

Daniel Hï¿½gg 26 Feb 2002 07:57:37 -0000

Hi!
I am trying to understand how to create and parse UTF-8 encoded XML documents
using Xerces.
But so far I have failed. For example the following piece of code throws up in
the parse method,
due to an illegal character. To me it looks like the serializer isn't using
UTF-8 even though it says so.
I have tried to tell it to use UTF-8, same result. I have tried to set the
OutputFormat to
UTF-8, same result. How is it suppose to work?


package dom;

import  org.w3c.dom.*;
import  org.apache.xerces.dom.*;
import  org.apache.xml.serialize.*;
import  java.io.*;
import javax.xml.parsers.*;

public class TstEnc
{
  public static void main( String[] argv ) {
    try
    {
      DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();

      Document doc = builder.newDocument();
      Element root = doc.createElement("ï¿½ke");
      doc.appendChild( root );

      OutputFormat    format  = new OutputFormat( doc );
      StringWriter  stringOut = new StringWriter();
      XMLSerializer    serial = new XMLSerializer( stringOut, format );
      serial.asDOMSerializer();
      serial.serialize( doc.getDocumentElement() );

      FileOutputStream  file1 = new FileOutputStream("doc.xml");
      file1.write(stringOut.toString().getBytes());
      file1.close();

      File file2 = new File("doc.xml");
      doc = builder.parse(file2);
    } catch ( Exception ex ) {
      System.out.println("Error: " + ex.getMessage());
    }
  }
}



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

UTF-8 character encoding

Reply via email to