> After reading http://www.w3c.org/MarkUp/SGML/sgml-lex/sgml-lex I am > convinced that <!-----> is a valid SGML (and therefore HTML) comment. > Therefore, I believe it is a bug if wget does not recognize such a comment.
I don't think so. Actually the rules for SGML "comments" are somewhat different. First, a comment need not be part of a comment declaration, but may as well appear in markup declarations, e.g. in the role of parameter separators. Example (from HTML 4 strict): <!ATTLIST BR %coreattrs; -- id, class, style, title -- > There is at least one comment here, namely between the firsts "visible" comment delimiter (-- before " id") and the second -- at the end of the second line. (The coreattrs entity itself has some more comments in its value's text.) In addition, a declaration may contain only comments, and nothing else. This is what is usually referred to as "comment" in web pages' HTML text. Example of a declaration that contains nothing but comments: <!-- a tree -- -- on mars? -- > This comment declaration has two comments and a few separators in it. The comment declaration rules are numbered 91, and 92 in the SGML standard. A comment declaration [91] is a markup declaration open (<!), optionally followed by a comment (see below) which might be followed by any number of separator-or-comment; the declaration is terminated by markup declaration close (>). comment declaration = mdo, (comment, (s | comment)*)?, mdc A comment [92] is a comment delimiter (--), followed by any number of SGML characters, followed by another comment delimiter (--). comment = com, SGML characer*, com (Since the subsentence "followed by..." in [91] is optional (?), an empty comment declaration will be "<!" immediately followed by ">", i.e. "<!>" is a comment, too.) So in the example <!-----> there are 5 hyphens, the first two of which can be interpreted as a comment delimiter, as can the second two. But then there is something else following the second two, namely a '-'. So this piece of text is as invalid as <!----z>. > Note: I haven't studied the source to confirm how it handles such a string. Neither have I. Georg