Hello!

I'm writing because I've found a bug in the current version of wget (1.10.1). I've tried to fix it but it has proven to be too much for me!

The bug is this: If you use -N and -O together, wget does not behave properly. Wget will always decide to download the remote url, even when the given output file already has a timestamp and size that matches the remote content.

-------

I think this bug lives almost entirely within http.c in the http_loop function.

This is for at least two reasons. The first is because even though -O is used to change the output document, wget's -N logic compares the remote datestamp against the date stamp of the file name that would have been used if -O had not been specified. I tracked that down to a simple assignment not being made and I made the following correction:

--- wget-1.10.1/src/http.c      Mon Aug  8 17:54:16 2005
+++ wget-1.10.1/src/http.c.mine Sun Sep  4 13:29:58 2005
@@ -2037,7 +2037,10 @@
       hstat.local_file = &dummy;
       /* be honest about where we will save the file */
       if (local_file && opt.output_document)
+      {
*local_file = HYPHENP (opt.output_document) ? NULL : xstrdup (opt.output_document);
+        hstat.local_file = local_file;
+      }
     }

   if (!opt.output_document)

This change worked, except now wget decided to always download the remote content because the local file was zero bytes. That seemed odd because the file I specified by -O was an exact copy of the remote content, not a zero byte file.

Well, that problem is a little harder to fix. The file given in -O is truncated in main.c at line 900 with this bit of code:

         output_stream = fopen (opt.output_document,
                                 opt.always_rest ? "ab" : "wb");

Its easy enough to prefix this with a "if( !opt.timestamping )", but then a correction needs to be made within http_loop so that it has an open file. That's pretty tough to do since line 2190 grabs the remote file headers (into our currently open file) but much later on, line 2301 is the start of the comparison of the remote to the local timestamp.

It is espically tough to fix this without breaking the --save-headers option. Perhaps logic needs to be added to cause wget to do its initial connection and header download to a temporary file or just cached into memory?? Ugh and double ugh..

It looks like if you grabed file.txt from the remote site and want to grab it again, http_loop seems to be built around the assumption that the unique name file.txt.1 will be used and the line 2301 comparison can be used to supress the creation of file.txt.1 so you are left with an intact file.txt. But this assumption doesn't work if -O forces the new file name to be the same name as the original file name.

So, I wanted to bring this problem to your attention.
I'm sorry I can't give a complete patch for the fix, but I am having some trouble understanding the http.c function. I'll take a closer look at it and see if the fix is easier than I am making it out to be, but I am afraid it might be over my head.

Thanks again for the awesome program!

- John Frear
[EMAIL PROTECTED]

Reply via email to