On Thu, Mar 06, 2008 at 09:32:54AM +0530, Ashwin wrote:
> 
>    Hi,
> 
>             The  attached  files return an encoding error on parsing. The
>    xmlfile  contains  a  reference  to  an  entity  in UTF-16 format, and
>    ideally there should be no problems. On investigating I found that the
>    problem   occurs   due  to  the  changes  made  for  bug  fix  #440159
>    (http://bugzilla.gnome.org/show_bug.cgi?id=440159)    ,SVN    Revision
>    3618(http://svn.gnome.org/viewvc/libxml2/trunk/encoding.c?r1=3545&r2=3
>    618).  If  I revert the changes the parsing happens properly and there
>    is no error.

  After some debugging by Bill, pinpointing the source of the problem,
yes it's a classical mistake one char is not one byte, it's a bit ironic
that we made it in code dedicated to encoding, sigh ... trivial patch
enclosed, I also add your test to the regression,

  thanks !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
Index: encoding.c
===================================================================
--- encoding.c  (revision 3698)
+++ encoding.c  (working copy)
@@ -1768,9 +1768,10 @@
      * echo '<?xml version="1.0" encoding="UCS4"?>' | wc -c => 38
      * 45 chars should be sufficient to reach the end of the encoding
      * declaration without going too far inside the document content.
+     * on UTF-16 this means 90bytes, on UCS4 this means 180
      */
-    if (toconv > 45)
-       toconv  = 45;
+    if (toconv > 180)
+       toconv  = 180;
     if (toconv * 2 >= written) {
         xmlBufferGrow(out, toconv);
        written = out->size - out->use - 1;
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to