Hi! I have recently noticed wget's strange behavior: it downloads the file twice, appending one copy after another. I realized it only happens when using both -c (continue) and -O (set output document filename) options together. Then, the output of the program is e.g.:
***** > wget -c -O xxx http://localhost/ --19:55:46-- http://localhost/ Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 44 [text/html] Saving to: `xxx' 100%[=======================================>] 44 --.-K/s in 0s --19:55:46-- http://localhost/ Reusing existing connection to localhost:80. HTTP request sent, awaiting response... 200 OK Length: 44 [text/html] Saving to: `xxx' 100%[=======================================>] 44 --.-K/s in 0s 19:55:46 (1.65 MB/s) - `xxx' saved [44/44] ***** or, with --debug: ***** >wget --debug -c -O xxx http://localhost/ Setting --continue (continue) to 1 Setting --output-document (outputdocument) to xxx DEBUG output created by Wget 1.10+devel on Windows-MSVC. --19:56:34-- http://localhost/ Resolving localhost... seconds 0.00, 127.0.0.1 Caching localhost => 127.0.0.1 Connecting to localhost|127.0.0.1|:80... seconds 0.00, connected. Created socket 284. Releasing 0x023908c8 (new refcount 1). ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: localhost Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Date: Thu, 08 Feb 2007 18:56:34 GMT Server: Apache/2.2.4 (Win32) Last-Modified: Sat, 20 Nov 2004 13:16:24 GMT ETag: "3b448-2c-6e1a3a00" Accept-Ranges: bytes Content-Length: 44 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html ---response end--- 200 OK Registered socket 284 for persistent reuse. Length: 44 [text/html] Saving to: `xxx' 100%[=======================================>] 44 --.-K/s in 0s --19:56:34-- http://localhost/ Reusing existing connection to localhost:80. Reusing fd 284. ---request begin--- GET / HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: localhost Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Date: Thu, 08 Feb 2007 18:56:34 GMT Server: Apache/2.2.4 (Win32) Last-Modified: Sat, 20 Nov 2004 13:16:24 GMT ETag: "3b448-2c-6e1a3a00" Accept-Ranges: bytes Content-Length: 44 Keep-Alive: timeout=5, max=99 Connection: Keep-Alive Content-Type: text/html ---response end--- 200 OK Length: 44 [text/html] Saving to: `xxx' 100%[=======================================>] 44 --.-K/s in 0s 19:56:34 (1.08 MB/s) - `xxx' saved [44/44] ***** After some Googling, I have found a message titled "wget -c Appears to be Broken in Latest SVN Revision (Debug Log Included)" by Emily Jackson from 01 Dec 2005 describing a problem that seems similar to mine. (http://www.mail-archive.com/wget@sunsite.dk/msg08481.html) Mauro Toronesi replied that bugs were expected and "fixing these bugs should not take more than a very few days." (http://www.mail-archive.com/wget@sunsite.dk/msg08483.html) Since I am not able to find any decent browser for the wget repository (ViewSVN or similar) and I do not want to look for the commit logs by hand with just a SVN client, I cannot check how the problem has been fixed. Anyway, I tried to at least hotfix the bug and I have come with the following: Index: src/http.c =================================================================== --- src/http.c (revision 2206) +++ src/http.c (working copy) @@ -2469,7 +2469,8 @@ } /* Did we get the time-stamp? */ - if (!got_head) + if (((opt.spider || opt.timestamping) && !got_head) + || (opt.always_rest && !got_name)) { bool restart_loop = false; which seems to work fine at least for my basic use case, but I do not use the more advanced features like timestamping and spidering, so I this might be just a wild guess. Regards -- Petr Kadlec