Martin (gzlist) wrote: > On 13/10/2009, Stefan Behnel <[email protected]> wrote: >> Lydia Patrovic wrote: >>> Note the "main&20090924_2" attribute value, which can be interpreted >>> as an >>> unterminated entity. >> :) Nice little Freudian copy&paste quoting error. Here's the line from the >> real 'HTML' file: >> >> <script type="text/javascript" src="merge.php?f=main&20090924_2"></script> >> >> Note the unescaped '&' character in the URL. > > I'd have thought the embedded null at byte 532 would be the cause. Try > bytes.replace("\x00", "") before treating it as a c string. Seems to > get the document parsed pretty much as expected for me.
Interesting. Sounds totally like the right solution. I wonder why the parser stops parsing here, though. Is '\0' explicitly considered an invalid character in (broken) HTML, or is it really just the usual C EOS slip? Stefan _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
