On Wed, Jun 21, 2006 at 04:29:56PM +0200, Cyrill Osterwalder wrote: > Hi all > > After some more research I believe to have found the reason for the > problem with the CDATA parsing. In case PARSE_HTML_RECOVER is true, the > following criteria in htmlParseTryOrFinish() is not enough for calling > htmlParseScript(): > > /* > * Handle SCRIPT/STYLE separately > */ > if ((!terminate) && > (htmlParseLookupSequence(ctxt, '<', '/', 0, 0) < 0)) > goto done; > htmlParseScript(ctxt); > > > This code makes sure that there is an end tag starting somewhere in the > buffer that is going to be processed by htmlParseScript(). However, in > recovery mode, htmlParseScript() will consume the "</" characters if the > real CDATA end tag is not fully inside the current chunk (like described > in the problem report).
True. I was think about something like that. This is all due to script and style having different parsing constraints. Why do you use PARSE_HTML_RECOVER ? The parser is already doing recovery mode to some extend without them (I mean the HTML parser :-). > I don't have a patch recommendation for the moment but I see two > possibilities: > > a) htmlParseTryOrFinish() could guarantee that the buffer contains the > desired close tag (or terminate is true). I guess that this could be > done using multiple htmlParseLookupSequence() calls and checking for the > tag name in a loop...? Hum, well we could check for the current element and make 2 specific tests in that case. This would be very hard anywy people are gonna come with '</ style' or '</foo> and expect taht to close the open tag, and 'style "</" style' and expect to not close it... > b) htmlParseScript would have to be more powerful in order to recognize > that it is trying to do xmlStrncasecmp() on an incomplete tag string. In > that case it should break and be called again by htmlParseTryOrFinish(). > That on the other hand would have to be more careful with the switch to > the end tag processing after the call to htmlParseScript(). Not sure it's much better > Possibility a) looks better to me and might try to implement a patch > example. You can try, but it's all very messy IMHO, I will take patches if not obviously broken (could be a good idea to provide examples for the test suite too). thanks Daniel -- Daniel Veillard | Red Hat http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml