Here's some anomalous behaviour in wget that I've been noticing.
Occassionally, when wget has to restart a download, it will make an error in
displaying the total and remaining amounts of the transfer. Here is a
transcript of a transfer I did recently to get the getmail package from the
current branch of the Slackware mirror at Purdue University. The first line is
the wget command exactly as invoked. The rest is its output. I've prefixed
each line with a line number because it's so long.
From line 88, we see that the package is 194641 bytes large and the first
attempt to download the package retrieves 5472 bytes on line 90. So the next
try has 189169 bytes left to download, but on line 112, it reports that amount
as the total size of the file being downloaded, and subtracts the 5472 already
downloaded again to get an errant amount left.
After downloading a total of 112176 bytes at line 114, it has to restart again,
Line 136 shows the exact same errors in display as line 112. this time,
though, since more than half of the file has been gotten, the dual subtraction
of the amount already downloaded sends the displayed amount remaining into
negative territory, which at least doesn't cause wget to crash. When the
progress bar reaches its fullest extent but there's still actually data to
downlod, the equals signs just start growning backwards into the plus signs
proportionally, while the percentage indicator stays stuck at 100%, showing
the resilience of that code as well.
In the case of this bug, the file's contents themselves are downloaded
correctly, but I have another bug I've encountered that occasssionally causes
the retry to begin downloading the file from the beginning while concatenating
onto the file contents already downloaded. (See below) In that case, the final
on-disk file can be repaired by using the split command to remove everything
from the beginning of the file that shouldn't be there and then reassembling
and renaming the rest of the file.
I'd also like to take this opportunity to lobby for an argument to
--server-response that enables a user to specify which server responses he or
she is actually intersted in seeing (or not in this case). In the case of this
bug report transcript, I'd really like to suppress the 220 responce, but
without piping the output through grep, with the subsequent way in which that
causes wget to change the progress indicators, that's impossible.
1: wget --no-host-directories --read-timeout=10 --tries=0 --server-response
--passive-ftp --cut-dirs=4
--output-document=slackware/n/getmail-4.6.3-noarch-1.tgz
ftp://ftp.cerias.purdue.edu/pub/os/slackware/slackware-current/slackware/n/getmail-4.6.3-noarch-1.tgz
2: --19:08:25--
ftp://ftp.cerias.purdue.edu/pub/os/slackware/slackware-current/slackware/n/getmail-4.6.3-noarch-1.tgz
3:=> `slackware/n/getmail-4.6.3-noarch-1.tgz'
4: Resolving ftp.cerias.purdue.edu... 128.10.252.10
5: Connecting to ftp.cerias.purdue.edu|128.10.252.10|:21... connected.
6: Logging in as anonymous ...
7: 220-
8: 220- Welcome to the CERIAS Security FTP Archive
9: 220-
10: 220-All activity is logged and may be monitored. If you object to this,
11: 220-do not log into this service. Before downloading any tools, tips,
12: 220-tricks or other bits of information from this site, please read and
13: 220-understand all of the implications of the information provided located
14: 220-in the root directory.
15: 220-
16: 220-Limitation of Liability - README.liability
17: 220-Export Restrictions - README.export
18: 220- Copyright Notice - README.copyright
19: 220-
20: 220---
21: 220-
22: 220- .;,
23: 220- iBMMMWt.
24: 220- :iVRY,
25: 220-,tYVWBMMMi.+IVXVYt+.
26: 220- :tYYXRMW+. :WRt;:+YI;
27: 220- .:+i++XItVR: ,tY.
28: 220- .:+RX tR
29: 220- :M ._..__. __.Rt
30: 220- _. _ ._.tR | [__](__ IX
31: 220-(_.(/,[ +M _|_| |.__)XY
32: 220- RY +M.
33: 220- .WV. YM,
34: 220-+VI+,,+VBI --vkoser
35: 220- ,;+tYVVVt:
36: 220-
37: 220-
38: 220- Purdue University
39: 220-
40: 220- CERIAS - Security Archive
41: 220-
42: 220-Center for Education and Research in
43: 220- Information Assurance and Security
44:
220-
45: 220- Local time is: Mon Jul 3 19:08:23 2006. You are user 4 out of 150.
46: 220-
47: 220-