George Prekas wrote:
> I have found a bug in Wget version 1.8.2 concerning comment handling ( <!-- > comment --> ). Take a look at the following illegal HTML code: > <HTML> > <BODY> > <a href="test1.html">test1.html</a> > <!--> > <a href="test2.html">test2.html</a> < <!--> > </BODY> > </HTML> > > Now, save the above snippet as test.html and try wget -Fi test.html. You > will notice that it doesn't recognise the second link. I have found a > solution to the above situation and have properly patched html-parse.c and I > would like some info on how can I give you the patch. The HTML code is legitimate, but it only contains one link. The following three lines constitute a single comment: <!--> <a href="test2.html">test2.html</a> <!--> A comment begins at "<!--" and ends at "-->". The trailing ">" on the first of these lines and the leading "<!" on the third of these lines are part of the comment. That is, the comment text is: > <a href="test2.html">test2.html</a> <! At any rate, one should not expect predictable behavior for broken HTML. What should wget do with the following? <a href="test1.html">test1.html <!--> </a> <!--> In one version, it might choose to follow the link to test1.html and in another version it might not. Tony