Re: [xml] html parsing incomplete - bug?

Stefan Behnel Tue, 13 Oct 2009 04:56:31 -0700

Martin (gzlist) wrote:
> On 13/10/2009, Stefan Behnel <[email protected]> wrote:
>> Lydia Patrovic wrote:
>>> Note the "main&amp;20090924_2" attribute value, which can be interpreted
>>> as an
>>> unterminated entity.
>> :) Nice little Freudian copy&paste quoting error. Here's the line from the
>> real 'HTML' file:
>>
>> <script type="text/javascript" src="merge.php?f=main&20090924_2"></script>
>>
>> Note the unescaped '&' character in the URL.
> 
> I'd have thought the embedded null at byte 532 would be the cause. Try
> bytes.replace("\x00", "") before treating it as a c string. Seems to
> get the document parsed pretty much as expected for me.


Interesting. Sounds totally like the right solution.

I wonder why the parser stops parsing here, though. Is '\0' explicitly
considered an invalid character in (broken) HTML, or is it really just the
usual C EOS slip?

Stefan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] html parsing incomplete - bug?

Reply via email to