Retrying displays wrong data sizes and sometimes starts over but not really

2006-07-07 Thread Cathy Garrett

Here's some anomalous behaviour in wget that I've been noticing.
Occassionally, when wget has to restart a download, it will make an error in
displaying the total and remaining amounts of the transfer. Here is a
transcript of a transfer I did recently to get the getmail package from the
current branch of the Slackware mirror at Purdue University. The first line is
the wget command exactly as invoked. The rest is its output. I've prefixed
each line with a line number because it's so long.

From line 88, we see that the package is 194641 bytes large and the first
attempt to download the package retrieves 5472 bytes on line 90. So the next
try has 189169 bytes left to download, but on line 112, it reports that amount
as the total size of the file being downloaded, and subtracts the 5472 already
downloaded again to get an errant amount left.

After downloading a total of 112176 bytes at line 114, it has to restart again,
Line 136 shows the exact same errors in display as line 112. this time,
though, since more than half of the file has been gotten, the dual subtraction
of the amount already downloaded sends the displayed amount remaining into
negative territory, which at least doesn't cause wget to crash. When the
progress bar reaches its fullest extent but there's still actually data to
downlod, the equals signs just start growning backwards into the plus signs
proportionally, while the percentage indicator stays stuck at 100%, showing
the resilience of that code as well.

In the case of this bug, the file's contents themselves are downloaded
correctly, but I have another bug I've encountered that occasssionally causes
the retry to begin downloading the file from the beginning while concatenating
onto the file contents already downloaded. (See below) In that case, the final
on-disk file can be repaired by using the split command to remove everything
from the beginning of the file that shouldn't be there and then reassembling
and renaming the rest of the file.

I'd also like to take this opportunity to lobby for an argument to
--server-response that enables a user to specify which server responses he or
she is actually intersted in seeing (or not in this case). In the case of this
bug report transcript, I'd really like to suppress the 220 responce, but
without piping the output through grep, with the subsequent way in which that
causes wget to change the progress indicators, that's impossible.

  1: wget --no-host-directories --read-timeout=10 --tries=0 --server-response 
--passive-ftp --cut-dirs=4 
--output-document=slackware/n/getmail-4.6.3-noarch-1.tgz 
ftp://ftp.cerias.purdue.edu/pub/os/slackware/slackware-current/slackware/n/getmail-4.6.3-noarch-1.tgz
  2: --19:08:25--  
ftp://ftp.cerias.purdue.edu/pub/os/slackware/slackware-current/slackware/n/getmail-4.6.3-noarch-1.tgz
  3:=> `slackware/n/getmail-4.6.3-noarch-1.tgz'
  4: Resolving ftp.cerias.purdue.edu... 128.10.252.10
  5: Connecting to ftp.cerias.purdue.edu|128.10.252.10|:21... connected.
  6: Logging in as anonymous ...
  7: 220-
  8: 220- Welcome to the CERIAS Security FTP Archive
  9: 220-
 10: 220-All activity is logged and may be monitored. If you object to this,
 11: 220-do not log into this service. Before downloading any tools, tips,
 12: 220-tricks or other bits of information from this site, please read and
 13: 220-understand all of the implications of the information provided located
 14: 220-in the root directory.
 15: 220-
 16: 220-Limitation of Liability - README.liability
 17: 220-Export Restrictions - README.export
 18: 220-   Copyright Notice - README.copyright
 19: 220-
 20: 220---
 21: 220-
 22: 220- .;,
 23: 220- iBMMMWt.
 24: 220-  :iVRY,
 25: 220-,tYVWBMMMi.+IVXVYt+.
 26: 220-   :tYYXRMW+.   :WRt;:+YI;
 27: 220-  .:+i++XItVR:  ,tY.
 28: 220-   .:+RX  tR
 29: 220- :M ._..__. __.Rt
 30: 220- _. _ ._.tR  | [__](__ IX
 31: 220-(_.(/,[  +M _|_|  |.__)XY
 32: 220-  RY  +M.
 33: 220-  .WV.   YM,
 34: 220-+VI+,,+VBI --vkoser
 35: 220-  ,;+tYVVVt:
 36: 220-
 37: 220-
 38: 220-   Purdue University
 39: 220-
 40: 220-   CERIAS - Security Archive
 41: 220-
 42: 220-Center for Education and Research in
 43: 220- Information Assurance and Security
 44: 
220-
 45: 220- Local time is: Mon Jul  3 19:08:23 2006.  You are user 4 out of 150.
 46: 220-
 47: 220-

RE: wget 403 forbidden error when no index.html.

2006-07-07 Thread Post, Mark K
The short answer is that you don't get to do it.  If your browser can't
do it, wget isn't going to be able to do it.


Mark Post 

-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Aditya Joshi
Sent: Friday, July 07, 2006 12:15 PM
To: wget@sunsite.dk
Subject: wget 403 forbidden error when no index.html.


I am trying to download a specific directory contents of a site and i
kep
getting the 403 forbidden when i run wget. The direcotry does not have
an
index.html and ofcourse any refrences to that path result a 403 page
displayed
in my browser. Is this why wget is not working. If so how to download
contents
of such sites.




RE: wget 403 forbidden error when no index.html.

2006-07-07 Thread Tony Lewis
You seriously expected the server to provide wget with a file when it
returned 403 to the browser?

wget must be provided with a valid URL before it can do anything. If you
want to download something from the server, figure out how to retrieve it in
your browser and then provide that URL to wget.

Tony
-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Aditya Joshi
Sent: Friday, July 07, 2006 9:15 AM
To: wget@sunsite.dk
Subject: wget 403 forbidden error when no index.html.


I am trying to download a specific directory contents of a site and i kep
getting the 403 forbidden when i run wget. The direcotry does not have an
index.html and ofcourse any refrences to that path result a 403 page
displayed in my browser. Is this why wget is not working. If so how to
download contents of such sites.



wget 403 forbidden error when no index.html.

2006-07-07 Thread Aditya Joshi

I am trying to download a specific directory contents of a site and i kep
getting the 403 forbidden when i run wget. The direcotry does not have an
index.html and ofcourse any refrences to that path result a 403 page displayed
in my browser. Is this why wget is not working. If so how to download contents
of such sites.