Hi, I have a problem parsing UCS-4LE encoded text with libxml2 2.6.24. My iconv supports that, I checked. However, when I do this:
------------------------------------ pctxt = xmlNewParserCtxt(); /* you can also statically use "UCS4" here, no change */ encoding = xmlGetCharEncodingName(xmlDetectCharEncoding(text,buffer_len)); result = xmlCtxtReadMemory(pctxt, text, buffer_len, filename, encoding, options); ------------------------------------ I get a fatal parser error stating "Start tag expected, '<' not found". I checked that the input really is UCS-4. libxml2 tells me it's UCS-4, iconv perfectly converts it to whatever I like and "wc -c" tells me that it correctly uses four bytes per character. I'm pretty convinced by now that the problem is not on my side of the screen. I tried to track down the problem in the libxml2 source, but I'm having a pretty hard time figuring out which of the three different stages where encoding could take place (parser, input, buffer) would make a difference here. So, I don't know, has anyone ever used this part of the libxml2 code and verified that it worked? One of the problems I found was that xmlFindCharEncodingHandler passes the "ISO-..." names of the UCS-4 encoding to iconv and iconv doesn't know those, but from what I read on, libxml2 then checks the alias names, which would normally yield the name "UCS-4" or "UCS4" which iconv recognises. So that takes a bit longer but should still work. And as I said, passing straight "UCS4" as encoding doesn't work either... Any hints on this one? Thanks, Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
