Noel Koethe [EMAIL PROTECTED] writes:
If the http content-length header differs from actual data length,
wget disregards the http specification as follows:
It doesn't disregard the HTTP specification. As far as I'm aware,
HTTP simply specifies that the information provided by Content-Length
must be correct. When it is not correct, the protocol has been broken
by the server and the best Wget can do is try to make sense of the
situation. In both cases you report, Wget's behavior is by design.
1) if content-length is greater than actual data, wget keeps
retrying to receive the whole file indefinitely.
Not indefinitely, but until `--tries' attempts (20 by default) have
been exhausted.
Using the command-line parameter --ignore-length fixes this but
should it not be on by default?
No. When you're downloading files over a slow or unstable network,
you will often get EOF while reading data. Retrying in spite of that
EOF has been one of Wget's primary features since the very beginning.
So Wget is not disregarding the spec, it is *honoring* it by assuming
that the provided Content-Length is correct, as it should be. This
feature has made many a download possible. In the cases where the
content-length header truly is broken, use `--ignore-length'.
2) If content-length is smaller than actual data sent by server,
wget happily downloads it all instead of stopping at what ever
content-length specified.
Again, this is a feature. Broken CGI scripts often report broken
values for `Content-Length'. When more data arrives, it becomes
apparent that the reported value is *broken* (unlike in the case when
less data arrives). Wget can either dismiss the rest of the data or
dismiss the header. I judged the data actually transmitted over the
wire to be more important than one obviously broken header.
The exception is when persistent connections are used. In that case,
Content-Length is honored to the letter, and the remote server had
*better* provide the correct value, or else.
This is contrary to the spec which strictly states that
content-length must be obeyed and that the user must be notified
that something strange happened.
Which spec says that?