Re: Comment handling

2003-06-03 Thread Tony Lewis
Georg Bauhaus wrote:


> I don't think so. Actually the rules for SGML "comments" are
> somewhat different.

Georg, I think we're talking about apples and oranges here. I'm talking
about what is legitimate in a comment in an SGML document. I think you're
talking about what is legitimate as a comment in an SGML declaration.

At any rate, I decided to do some more poking around. I wrote a web page
(see http://www.exelana.com/comments.html) with the following variations on
comments:





The browsers I tried (Internet Explorer, Mozilla, and Lynx) ignore all of
them. I also tried the W3C Markup Validation Service at
http://validator.w3.org/

It reported that the last one is not valid:

Line 22 column 8: comment started here


http://validator.w3.org/check?uri=http%3A%2F%2Fwww.exelana.com%2Fcomments.html&doctype=HTML+2.0&charset=us-ascii+%28basic+English%29

The moral of the story: one cannot evaluate an HTML document solely on what
any browser (or even all of them) do with it.

Tony



Re: Comment handling

2003-06-03 Thread Georg Bauhaus
> > So in the example  there are 5 hyphens, the first two
> > of which can be interpreted as a comment delimiter, as can
> > the second two. But then there is something else following the
> > second two, namely a '-'. So this piece of text is as invalid
> > as  (valid, 1)

(It doesn't stop at the first > here, if I understand your outline
of the algorithm correctly, fine.)

 (valid, 2)
 (invalid, 2a)

(Looks tricky to me at first sight, but the presence of "a" behind "--", which
is neither white space nor dashes, could trigger resumption of looking for the
second pair of dashes. With a bit of luck, there is one, and  neither was "a "
a typo and shouldn't be there at all, nor was ">" forgotten before "a".)

 (valid, 3)
 Paris --> (valid, 3a, but probably surprising)

(I think detecting that would be mind reading magic in the general case?)

 (invalid, 4)

(Just another illustration of 2a)

 (invalid, 5, but
maybe important)

(Yet another illustration of 2a, might be useful to get this right.
for example, for extractting URLs from "commented" JavaScrips.)

 (valid, 6)

space (separators) before >

> The code does the following:
> 1. Looks for the ! immediately after <, otherwise it aborts.
> 2. Looks for whitespace or dash.
> 3. If it finds a dash, it looks for one more, otherwise it aborts.
> 4. When inside the comment, it looks for a dash.
> 5. If it finds a dash, it looks for one more, otherwise it aborts.
> 6. Finally, it looks for a > to quit gracefully.

This is what I have tried, leaving out EOF. Basically the algorithm is quite
tolerant and, after "' or for the next
"--[[:space]]*>". This will include some very invalid comments, but so what? I
thought it might blend well with typical wget use. It doesn't handle .

1'. Looks for the ! immediately after <, otherwise it aborts.
2'. Skips white space.
3'. Looks for >, if it finds one, quits gracefully.
4'. If it finds a dash, it looks for one more. Otherwise, i.e.
if it does not find a first dash, the rest of input is an incomplete
comment, or else another type of declaration, which was precluded.
4a'. If it doesn't find a second one, the search is restarted at 4'
at where it looked for the second one. 
5'. (there is a second dash) Move ahead.
5a'. Either there is '>', quit gracefully.
5b'. Or there is white space, go to 5'.
5c'. Or goto 4'.


So this just disregards the 4k requirement, because that isn't known
enough to be useful anywhere outside validating parsers anyway?

If I may suggest code reuse, not intending offence, I think Mozilla
does a fairly good job at handling malformed comments, from what I
see (in the browser); could that be used as a source of inspiration?


-- Georg





Re: Comment handling

2003-06-03 Thread Georg Bauhaus
> 
> This is what I have tried, leaving out EOF. Basically the algorithm is quite
> tolerant and, after "' or for the next
> "--[[:space]]*>". This will include some very invalid comments, but so what? 
> I
> thought it might blend well with typical wget use. It doesn't handle .

And, darn, I have forgotten to allow any number of dashes in addition
to the white space before >.



-- Georg




Re: Comment handling

2003-06-03 Thread Georg Bauhaus
> Georg, I think we're talking about apples and oranges here. I'm talking
> about what is legitimate in a comment in an SGML document. I think you're
> talking about what is legitimate as a comment in an SGML declaration.

Ah, yes, o.K., I was reacting to "valid SGML comments", where legitimate
is not defined. Should be different for wget, indeed. I hope my other
letter explains.
(And, to be nitpicky, an SGML declaration is another defined term
which refers to the character sets, capacities, markup minimization,
etc, of an SGML parser. :-)

> At any rate, I decided to do some more poking around. I wrote a web page
> (see http://www.exelana.com/comments.html) with the following variations on
> comments:
> 
> 
> 
> 
> 
> The browsers I tried (Internet Explorer, Mozilla, and Lynx) ignore all of
> them. I also tried the W3C Markup Validation Service at
> http://validator.w3.org/
> 
> It reported that the last one is not valid:
> 
> Line 22 column 8: comment started here
> 

Which, incidentally, is a confusing error message, as this comment
is, in itself, correct. (which you can see removing the middle dashes
two lines above it. It's the 4k issue that George Prekas has written
about.)

http://validator.w3.org/check?uri=http%3A%2F%2Fhome.knuut.de%2Fbauhaus%2Fh2.htm
l&charset=utf-8+%28Unicode%2C+worldwide%29&doctype=%28detect+automatically%29


-- Georg




RE: i tried to run the new wget for windows and this is what i got

2003-06-03 Thread Herold Heiko
If you got that binary from my site you should have read the relevant
description.
So you'd have downloaded and installed the correct ssl libraries.
If you've got it from somewhere else contact who provided that binary.
Beside that, sending a screenshot in order to transmit a simple text error
message usually is considered rude.

Heiko 

> -Original Message-
> From: Ernst, Yehuda [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 03, 2003 11:15 AM
> To: [EMAIL PROTECTED]
> Subject: i tried to run the new wget for windows and this is 
> what i got
> 
> 
>  <> 
> 
> 
> any ides?
> **
> *
> Information contained in this email message is intended only 
> for use of the individual or entity named above. If the 
> reader of this message is not the intended recipient, or the 
> employee or agent responsible to deliver it to the intended 
> recipient, you are hereby notified that any dissemination, 
> distribution or copying of this communication is strictly 
> prohibited. If you have received this communication in error, 
> please immediately notify the [EMAIL PROTECTED] and 
> destroy the original message.
> **
> *
>