I'm working through some rather thorny experiments with new XML
support within the browser and I ran into this snippet:

static void switchToUTF16(xmlParserCtxtPtr ctxt)
{
    // Hack around libxml2's lack of encoding overide support by manually
    // resetting the encoding to UTF-16 before every chunk.  Otherwise libxml
    // will detect <?xml version="1.0" encoding="<encoding name>"?> blocks
    // and switch encodings, causing the parse to fail.
    const UChar BOM = 0xFEFF;
    const unsigned char BOMHighByte = *reinterpret_cast<const unsigned
char*>(&BOM);
    xmlSwitchEncoding(ctxt, BOMHighByte == 0xFF ?
XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE);
}

Looking at the libxml2 API, I've been baffled myself about how to
control the character encoding from the outside.  This looks like a
serious lack of an essential feature.  Anyone know about this above
"hack" and can provide more detail?

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Reply via email to