Hi!

I have recently noticed wget's strange behavior: it downloads the file twice, 
appending one copy after another. I realized it only happens when using both -c 
(continue) and -O (set output document filename) options together. Then, the 
output of the program is e.g.:

*****

> wget -c -O xxx http://localhost/
--19:55:46--  http://localhost/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44 [text/html]
Saving to: `xxx'

100%[=======================================>] 44          --.-K/s   in 0s

--19:55:46--  http://localhost/
Reusing existing connection to localhost:80.
HTTP request sent, awaiting response... 200 OK
Length: 44 [text/html]
Saving to: `xxx'

100%[=======================================>] 44          --.-K/s   in 0s

19:55:46 (1.65 MB/s) - `xxx' saved [44/44]

*****

or, with --debug:

*****

>wget --debug -c -O xxx http://localhost/
Setting --continue (continue) to 1
Setting --output-document (outputdocument) to xxx
DEBUG output created by Wget 1.10+devel on Windows-MSVC.

--19:56:34--  http://localhost/
Resolving localhost... seconds 0.00, 127.0.0.1
Caching localhost => 127.0.0.1
Connecting to localhost|127.0.0.1|:80... seconds 0.00, connected.
Created socket 284.
Releasing 0x023908c8 (new refcount 1).

---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: localhost
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Thu, 08 Feb 2007 18:56:34 GMT
Server: Apache/2.2.4 (Win32)
Last-Modified: Sat, 20 Nov 2004 13:16:24 GMT
ETag: "3b448-2c-6e1a3a00"
Accept-Ranges: bytes
Content-Length: 44
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

---response end---
200 OK
Registered socket 284 for persistent reuse.
Length: 44 [text/html]
Saving to: `xxx'

100%[=======================================>] 44          --.-K/s   in 0s

--19:56:34--  http://localhost/
Reusing existing connection to localhost:80.
Reusing fd 284.

---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: localhost
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Thu, 08 Feb 2007 18:56:34 GMT
Server: Apache/2.2.4 (Win32)
Last-Modified: Sat, 20 Nov 2004 13:16:24 GMT
ETag: "3b448-2c-6e1a3a00"
Accept-Ranges: bytes
Content-Length: 44
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html

---response end---
200 OK
Length: 44 [text/html]
Saving to: `xxx'

100%[=======================================>] 44          --.-K/s   in 0s

19:56:34 (1.08 MB/s) - `xxx' saved [44/44]

*****

After some Googling, I have found a message titled "wget -c Appears to be 
Broken in Latest SVN Revision (Debug Log Included)" by Emily Jackson from 01 
Dec 2005 describing a problem that seems similar to mine. 
(http://www.mail-archive.com/wget@sunsite.dk/msg08481.html) Mauro Toronesi 
replied that bugs were expected and "fixing these bugs should not take more 
than a very few days." 
(http://www.mail-archive.com/wget@sunsite.dk/msg08483.html) Since I am not able 
to find any decent browser for the wget repository (ViewSVN or similar) and I 
do not want to look for the commit logs by hand with just a SVN client, I 
cannot check how the problem has been fixed.

Anyway, I tried to at least hotfix the bug and I have come with the following:

Index: src/http.c
===================================================================
--- src/http.c  (revision 2206)
+++ src/http.c  (working copy)
@@ -2469,7 +2469,8 @@
         }
 
       /* Did we get the time-stamp? */
-      if (!got_head)
+      if (((opt.spider || opt.timestamping) && !got_head)
+         || (opt.always_rest && !got_name))
         {
           bool restart_loop = false;
  
which seems to work fine at least for my basic use case, but I do not use the 
more advanced features like timestamping and spidering, so I this might be just 
a wild guess.

Regards
-- Petr Kadlec

Reply via email to