Re: wget does not honour content-length http header [http://bugs.debian.org/143736]

2002-04-25 Thread Tony Lewis

Hrvoje Niksic wrote:

 If your point is that Wget should print a warning when it can *prove*
 that the Content-Length data it received was faulty, as in the case of
 having received more data, I agree.  We're already printing a similar
 warning when Last-Modified is invalid, for example.

I'm afraid you'll have to ask R. Fielding, J. Gettye, J. Mogul, H. Frystyk,
and T. Berners-Lee what they were thinking. grin I was just quoting from
RFC 2068: Hypertext Transfer Protocol -- HTTP/1.1

As for printing a warning only when wget can prove that the Content-Length
data was faulty, sounds like a reasonable implementation to me.

Tony




wget does not honour content-length http header [http://bugs.debian.org/143736]

2002-04-23 Thread Noel Koethe

Hello,

If the http content-length header differs from actual data length,
wget disregards the http specification as follows:
1) if content-length is greater than actual data, wget keeps retrying to
receive the whole file indefinitely. Using the command-line parameter
--ignore-length fixes this but should it not be on by default?
2) If content-length is smaller than actual data sent by server, wget
happily downloads it all instead of stopping at what ever content-length
specified. This is contrary to the spec which strictly states that
content-length must be obeyed and that the user must be notified that
something strange happened. It correctly tells the user that it received
nnn/mmm bytes, where mmm is content-length but should there not be an
error message, too?

http://bugs.debian.org/143736

Thank you.

-- 
Noèl Köthe



Re: wget does not honour content-length http header[http://bugs.debian.org/143736]

2002-04-23 Thread Hrvoje Niksic

Noel Koethe [EMAIL PROTECTED] writes:

 If the http content-length header differs from actual data length,
 wget disregards the http specification as follows:

It doesn't disregard the HTTP specification.  As far as I'm aware,
HTTP simply specifies that the information provided by Content-Length
must be correct.  When it is not correct, the protocol has been broken
by the server and the best Wget can do is try to make sense of the
situation.  In both cases you report, Wget's behavior is by design.

 1) if content-length is greater than actual data, wget keeps
 retrying to receive the whole file indefinitely.

Not indefinitely, but until `--tries' attempts (20 by default) have
been exhausted.

 Using the command-line parameter --ignore-length fixes this but
 should it not be on by default?

No.  When you're downloading files over a slow or unstable network,
you will often get EOF while reading data.  Retrying in spite of that
EOF has been one of Wget's primary features since the very beginning.

So Wget is not disregarding the spec, it is *honoring* it by assuming
that the provided Content-Length is correct, as it should be.  This
feature has made many a download possible.  In the cases where the
content-length header truly is broken, use `--ignore-length'.

 2) If content-length is smaller than actual data sent by server,
 wget happily downloads it all instead of stopping at what ever
 content-length specified.

Again, this is a feature.  Broken CGI scripts often report broken
values for `Content-Length'.  When more data arrives, it becomes
apparent that the reported value is *broken* (unlike in the case when
less data arrives).  Wget can either dismiss the rest of the data or
dismiss the header.  I judged the data actually transmitted over the
wire to be more important than one obviously broken header.

The exception is when persistent connections are used.  In that case,
Content-Length is honored to the letter, and the remote server had
*better* provide the correct value, or else.

 This is contrary to the spec which strictly states that
 content-length must be obeyed and that the user must be notified
 that something strange happened.

Which spec says that?