On Sat, Jul 19, 2008 at 03:29:32PM -0700, denverrox denver wrote: > Greetings list! > > I'm using libxml2 (specifically the HTMLparser/tree modules, and the xpath > library) to perform transformation operations on HTML input files, and have > run into a character encoding issue: > > Specifically, I have two HTML documents, one in 8859-1 encoding, and the > other in UTF-8. > > First I parse both documents into DOM trees. > > Then, I'm performing an XPath on the 8859-1 document, cloning the resultset > nodes using "xmlCopyNodeList," then using "xmlAddNextSibling" to add the > 8859-1 document content into a document that was originally UTF-8 encoded. > > This results in the 8859-1 content not being correctly serialized if I output > the UTF-8 document. Special characters are garbled, etc.
I guess that assertion need a more precise description. Any character in 8859-1 will en encoded with 1 or 2 bytes in UTF-8 without problem. > Based on the libxml2 encodings webpage ( > http://xmlsoft.org/encoding.htmlhttp://xmlsoft.org/encoding.html ), it seems > that libxml2 converts all character encodings to UTF-8 internally. Therefore > unless I'm misunderstanding something, the 8859-1 document should be in UTF-8 > after parsing. yes it is in UTF-8 internally > Is there any reason why this serialization problem should occur, if both the > 8859-1 document and UTF-8 document are converted to native UTF-8 by libxml2? > Shouldn't it "just work"? My impression is that you can freely copy cloned > nodesets between documents, as they're all internally in UTF-8. Careful > review of the libXML2 encodings page seems to agree with this assertion, so > I'm quite stumped. I don't think there should be any problem Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
