Alex Milowski: > On Tue, Jan 4, 2011 at 7:05 PM, Alexey Proskuryakov <a...@webkit.org> wrote: >> >> 04.01.2011, в 18:40, Alex Milowski написал(а): >> >>> Looking at the libxml2 API, I've been baffled myself about how to >>> control the character encoding from the outside. This looks like a >>> serious lack of an essential feature. Anyone know about this above >>> "hack" and can provide more detail? >> >> >> Here is some history: >> <http://mail.gnome.org/archives/xml/2006-February/msg00052.html>, >> <https://bugzilla.gnome.org/show_bug.cgi?id=614333>. > > Well, that is some interesting history. *sigh* > > I take it the "work around" is that data is read and decoded into an > internal string which is represented by a sequence of UChar. As such, > we treat it as UTF16 character encoded data and feed that to the > parser, forcing it to use UTF16 every time. > > Too bad we can't just tell it the proper encoding--possibly the one > from the transport--and let it do the decoding on the raw data. Of > course, that doesn't guarantee a better result.
Is there a reason why we can't pass the "raw" data to libxml2? E.g. when the input file is UTF-8 we convert it into UTF-16 and then libxml2 converts it back into UTF-8 (its internal format). This is a real performance problem when parsing XML [1]. Is there some (required) magic involved when detecting the encoding in WebKit? AFAIK XML always defaults to UTF-8 if there's no encoding declared. Can we make libxml2 do the encoding detection and provide all of our decoders so it can use it? [1] https://bugs.webkit.org/show_bug.cgi?id=43085 - Patrick _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev