On 05/05/2011 09:21 AM, Nikolay Sivov wrote:
On 5/5/2011 18:09, Adam Martinson wrote:
On 05/04/2011 05:56 PM, Nikolay Sivov wrote:
You don't need a separate helper for that. doparse() should do fine with "UTF-8" encoding parameter to force encoding.
This is to avoid calling domdoc_loadXML() from domdoc_load(). When we need almost identical behavior from 2 separate interface functions having a common internal function they both call makes sense to me...
My point is that we already have doparse(This, str, len, "UTF-8"); that does exactly what you need. Side effects like setting creating xmlDocNew(NULL) are handled later in ::load() too. Also ::loadXML() takes BSTR that's always UTF-16, and we don't do it right now - cause it's possible to avoid conversion it seems. Adding load_utf8 will mean we stick to the utf16 -> utf8 conversion in loadXML.

My logic for having this function:
domdoc_load() is basically a more flexible version of domdoc_loadXML() that takes a VARIANT instead of a BSTR, and treats its input as a file of some sort. But VT_ARRAY|VT_UI1 case is just a non-null-terminated string, which is more like what domdoc_loadXML() does. To me it makes sense to merge this case with the internal domdoc_loadXML() implementation.

I vaguely remember trying to avoid the conversion in domdoc_loadXML() and just passing the BSTR to doparse() with encoding="UTF-16", and running into some quirk in libxml... If memory serves it was that when there's a <?xml encoding="foo"?> declaration in the string that doesn't match the actual encoding, libxml had a fit if the actual encoding wasn't UTF-8. eg, passing a UTF-16 string that contained <?xml encoding="UTF-8"?> caused an error, but passing a UTF-8 string that contained <?xml encoding="UTF-16"?> did not. It's been a while though, it may be worth another look.





Reply via email to