On Sat, Jul 19, 2008 at 03:29:32PM -0700, denverrox denver wrote:
> Greetings list!
> 
> I'm using libxml2 (specifically the HTMLparser/tree modules, and the xpath 
> library) to perform transformation operations on HTML input files, and have 
> run into a character encoding issue:
> 
> Specifically, I have two HTML documents, one in 8859-1 encoding, and the 
> other in UTF-8.
> 
> First I parse both documents into DOM trees.
> 
> Then, I'm performing an XPath on the 8859-1 document, cloning the resultset 
> nodes using "xmlCopyNodeList," then using "xmlAddNextSibling" to add the 
> 8859-1 document content into a document that was originally UTF-8 encoded.
> 
> This results in the 8859-1 content not being correctly serialized if I output 
> the UTF-8 document.  Special characters are garbled, etc.

  I guess that assertion need a more precise description. Any character
in 8859-1 will en encoded with 1 or 2 bytes in UTF-8 without problem.

> Based on the libxml2 encodings webpage ( 
> http://xmlsoft.org/encoding.htmlhttp://xmlsoft.org/encoding.html ), it seems 
> that libxml2 converts all character encodings to UTF-8 internally. Therefore 
> unless I'm misunderstanding something, the 8859-1 document should be in UTF-8 
> after parsing.  

  yes it is in UTF-8 internally

> Is there any reason why this serialization problem should occur, if both the 
> 8859-1 document and UTF-8 document are converted to native UTF-8 by libxml2?  
> Shouldn't it "just work"? My impression is that you can freely copy cloned 
> nodesets between documents, as they're all internally in UTF-8.  Careful 
> review of the libXML2 encodings page seems to agree with this assertion, so 
> I'm quite stumped.

  I don't think there should be any problem

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to