I was running wget to test mirroring an internal development site, and using large database dumps (binary format) as part of the content to provide me with a large number of binary files for the test. For the test I wanted to see if wget would run and download a quantity of 500K files with 100GB of total data transferred.

The test was going fine and wget ran flawlessly for 3 days downloading almost the entire contents of the test site and I was at 85GB. wget would have run until the very end and would have passed the test downloading all 100GB of the test files.

Then a power outage occurred, my local test box was not on battery backup, so I had to restart wget and the test. wget did not refetch the binary backup files and gave (for each file that had already been retrieved the following message:

-
               => `<domain>/database/dbdump_107899.gz'
Connecting to <domain>|<ip>|:80... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

   The file is already fully retrieved; nothing to do.
-

---
wget continued to run for about eight hours, and gave the above message on several thousands files, then crashed giving:
   wget: realloc: Failed to allocate 536870912 bytes; memory exhausted.


This was surprising because wget ran flawlessly on the initial download for several days but on a "refresh" or incremental backup of the data, wget crashed after eight hours. I believe it has something to do with the code that is run when wget already finds a local file with the same name and sends a "range" request. Maybe there is some data structure that keeps getting added to so it exhausts the memory on my test box which has 2GB. There were no other programs running on the test box.

This may be a bug. To get around this for purposes of my test, I would like to know if there is anyway (any switch) to tell wget to not send any type of range request at all, if the local filename exists but to skip sending any type of request, if it finds a file with the same name. I do not want it to check to see if the file is newer, if the file is complete, just skip it and go on to the next file.

----
I was running wget under cygwin on a Windows XP box.

The wget command that I ran was the following:
   wget -m -l inf --convert-links --page-requisites http://<domain>

I had the following .wgetrc file
$HOME/.wgetrc
#backup_converted=on
page_requisites=on
continue=on
dirstruct=on
#mirror=on
#noclobber=on
#recursive=on
wait=3
http_user=<username>
http_passwd=<passwd>
#convert_links=on
verbose=on
user_agent=firefox
dot_style=binary

Reply via email to