On Thursday 16 April 2015 10:32:32 you wrote:
> > There you go; you find the updated patch attached. It now requires
> > HTML_PARSE_RECOVER option to be set for recovering from stand-alone
> > less-than characters.
>
> That sounds fine *except* it doesn't raise an error.
> The parser knows it's a broken construct that must be pointed out.
Ok, I see what I can do about that. ;)
> It sounds a bit weird to handle that error case as one of the main content
> cases, I would still be tempted to go into htmlParseStartTag, get the
> error reported, but push corrective data instead in recover mode.
My initial thought solution was to enter htmlParseElement() like before, and
in case htmlParseElement() encounters an error, it would handle the chunk as
text instead (if recover option is on). That would probably come to the
closest what most browsers seem to do. But the problem: that would require the
public API function's prototype of
void htmlParseElement(htmlParserCtxtPtr)
to be changed to
int htmlParseElement(htmlParserCtxtPtr)
To avoid that API change, one could add another internal (static) version of
htmlParseElement() providing a return value, however there is already one
htmlParseElementInternal(), so adding yet another one would become nasty IMO.
Best regards,
Christian Schoenebeck
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
https://mail.gnome.org/mailman/listinfo/xml