On Thu, Mar 06, 2008 at 09:32:54AM +0530, Ashwin wrote:
>
> Hi,
>
> The attached files return an encoding error on parsing. The
> xmlfile contains a reference to an entity in UTF-16 format, and
> ideally there should be no problems. On investigating I found that the
> problem occurs due to the changes made for bug fix #440159
> (http://bugzilla.gnome.org/show_bug.cgi?id=440159) ,SVN Revision
> 3618(http://svn.gnome.org/viewvc/libxml2/trunk/encoding.c?r1=3545&r2=3
> 618). If I revert the changes the parsing happens properly and there
> is no error.
After some debugging by Bill, pinpointing the source of the problem,
yes it's a classical mistake one char is not one byte, it's a bit ironic
that we made it in code dedicated to encoding, sigh ... trivial patch
enclosed, I also add your test to the regression,
thanks !
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Index: encoding.c
===================================================================
--- encoding.c (revision 3698)
+++ encoding.c (working copy)
@@ -1768,9 +1768,10 @@
* echo '<?xml version="1.0" encoding="UCS4"?>' | wc -c => 38
* 45 chars should be sufficient to reach the end of the encoding
* declaration without going too far inside the document content.
+ * on UTF-16 this means 90bytes, on UCS4 this means 180
*/
- if (toconv > 45)
- toconv = 45;
+ if (toconv > 180)
+ toconv = 180;
if (toconv * 2 >= written) {
xmlBufferGrow(out, toconv);
written = out->size - out->use - 1;
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml