XML encoding

Xiaofan Zhou Sat, 05 Feb 2005 01:38:28 -0800

Hi, All,

I have a question regarding how encoding is handled in xerces-c. Let’s say I have an xml document, the header is <?xml version=”1.0”, encoding=”UTF-8” ?>, also assume the xml body is correctly encoded in UTF-8. Now I created an SAX2XMLReader and pass a LocalFileInputSource(myDoc) to do the passing, so I can receive

A bunch of SAX events.

If I understand correctly, the xerces parser will get the document encoding information from the header, which is UTF-8 in this case. But the different XML document

may have different encoding.

So here are my questions:

(1) For the simple type element or attribute in the SAX events I receive, what encoding should I assume for the value?

(2) And for the tag?

(3) Is there a way to get the encoding information from the parser (for example, from the SAX2XMLReader I created)?

I need the encoding information, because my application that uses Xerese-c to parse XML files can be configured to run in different codepage, for example, UTF-8 or WINDOWS-1252 et cetera, so after an input xml is parsed by Xerces, I need to first convert the atttrbute/element values into my application code page from their original code page.

Thanks for your help in advance.

Frank

XML encoding

Reply via email to