[ ... ]
I have downloaded Mozilla's source. It was 30MB! Now, I searched where
Mozilla handles comments and found mozilla/htmlparser/src/nsHTMLTokens.cpp.
Inside it, there are two functions: ConsumeStrictComment and
ConsumeQuirksComment. The first one follows the rules, the second one tries
to handle even invalid comments and it uses an algorithm like this:
1. Looks for
arbitrary number of dashes
My thoughts on the subject:
Before looking at Mozilla, I made my own algorithm. It is based on the
following thought: Every comment ends at the >, unless this > is inside the
comment. The hard part is to decide when it is a comment. Well, it has to
start with --. But is this enough, I mean just look at the last comment
above. To my opinion, a comment to be a comment must start with -- and the
next nonblank character should not be - or >.
That's for now. Please give me some feedback with your thoughts and tell me
if you would like the comment handling mechanism of WGet to change. By the
way, who was written the current one? Maybe, he can help us with his
experience.
Regards,
George Prekas.