Hi, I simply made the libxml parser to 1. parse a file. (parse_file() in perl) 2. convert the obtained tree structure to a string again (toString()) 3. Write back the string to another file.
The input file was UTF-8 encoded. It had the character ß (Beeta) which when viewed in hex viewer was showing the following hex values (c3 9f). But in the output file, this encoding is changed and replaced by the character (df) which is the hex value for the extended ascii set character ß (beeta). I am attaching the files as well. Is this a bug with libxml2 or there is something wrong from my side? How can this be avoided? Thanks and regards, Arun. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Arun S K (RBIN/EDM3) * Sent: Tuesday, 13. December 2005 3:55 PM To: [email protected] Subject: [xml] Problem with encoding in libxml. Sorry for the wrong subject in the earlier mail. I have the files in windows machine. Hi all, I was trying to parse an XML file with encoding set to UTF8 having the following header <?xml version="1.0" encoding="UTF8"?> The document has the character ß (Beeta) in it. The parser aborts with the following message -------------------------------------------------------------------- :13: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0x80 0x20 0x3C 0x2F <NAME>test_1ß</NAME> -------------------------------------------------------------------- Is ß not a valid UTF8 character? How can this be corrected. Could anybody please help me. Thanks and regards, Arun. _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
