Hi, All,

 

I have a question regarding how encoding is handled in xerces-c. Let’s say I have an xml document, the header is <?xml version=”1.0”, encoding=”UTF-8” ?>, also assume the xml body is correctly encoded in UTF-8. Now I created an SAX2XMLReader and pass a LocalFileInputSource(myDoc) to do the passing, so I can receive

A bunch of SAX events.

 

If I understand correctly, the xerces parser will get the document encoding information from the header, which is UTF-8 in this case. But the different XML document

may have different encoding. 

 

So here are my questions:

 

(1)     For the simple type element or attribute in the SAX events I receive, what encoding should I assume for the value?

(2)      And for the tag?

(3)     Is there a way to get the encoding information from the parser (for example, from the SAX2XMLReader I created)?   

 

I need the encoding information, because my application that uses Xerese-c to parse XML files can be configured to run in different codepage, for example, UTF-8 or WINDOWS-1252 et cetera, so after an input xml is parsed by Xerces, I need to first convert the atttrbute/element values into my application code page from their original code page.

 

Thanks for your help in advance.

 

Frank

 

 

 

 

 

Reply via email to