Re: wget-cvs-ifmodsince.patch
It's been awhile since I have been able to look into this, but here is a method I think I am going to tryout: 1. Send a Head-only request with a if-unmodified-since (local file date). 2.1 If we recieve the Header, check the file size and download if different. 2.2 if we recieve 412 (Precondition Failed), we send the Get request the recieve the file. I think this is the most efficient way to perform timestamping as described in the documentation ( and still make use of the http headers made for this purpose). This way we make one request for most files, and two for files that need to be downloaded. Also I saw an email requesting we create an option to set all downloaded files to the current time. I will also work on implementing this with my patch if youwould like.. Craig Sowadski From: Hrvoje Niksic [EMAIL PROTECTED] To: Craig Sowadski [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Sat, 28 Feb 2004 02:52:07 +0100 Craig Sowadski [EMAIL PROTECTED] writes: My only concern about only checking modification date is when there is an incomplete download, the local modification date is set to the current time. So when the mirror is next attempted, the file is marked newer than the server file and is not replaced. Hmm, you're right -- if-modified-since utterly fails in that (not at all uncommon) scenario. Is there some way to make the local modification date (time_t = 0 ) until it is finished?? This way the incomplete file will always be older than the server file, and will be replaced. I don't think there's a reliable way to do that. So, I guess the only correct approach is in fact the one you used in your patch: 1. Send a HEAD request and get the response. 2.1. If the response contains the Last-Modified header and it indicates that the remote file is old, tell the user that there is no need to get the file. 2.2. Otherwise, send a new GET request with `If-Modified-Since'. 2.2.1. If the response is 304 Not Modified, tell the user that there is no need to get the file. 2.2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request. The remaining question is: does Wget need the added complexity only to support servers that don't bother sending Last-Modified, but that do support If-Modified-Since. How frequent are such servers anyway? _ All the action. All the drama. Get NCAA hoops coverage at MSN Sports by ESPN. http://msn.espn.go.com/index.html?partnersite=espn
Re: wget-cvs-ifmodsince.patch
My only concern about only checking modification date is when there is an incomplete download, the local modification date is set to the current time. So when the mirror is next attempted, the file is marked newer than the server file and is not replaced. Is there some way to make the local modification date (time_t = 0 ) until it is finished?? This way the incomplete file will always be older than the server file, and will be replaced. Craig From: Hrvoje Niksic [EMAIL PROTECTED] To: Craig Sowadski [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Thu, 26 Feb 2004 00:44:48 +0100 Craig Sowadski [EMAIL PROTECTED] writes: 1. I can implement a patch like my original that only uses the if-modified-since when the last-modified field is excluded from the head-only request. Isn't that what your most recent patch implements? 2. Send the if-modified-since request, then get header to check for size only if the if-modified-since request sends back a 304 - not modified That means introducing an additional hop for unchanged files. This is in a way even worse because I'd expect them to be in the majority. 3. implement a separate time-stamping and file-size checking options. This might make the most sense. Or, we could completely ignore the file size issue and simply trust what If-Modified-Since tells us. I guess I'd like for time-stamping to use If-Modified-Since, period. That's what it's for, and the Last-Modified thing has always been a weird hack that somehow survived the times. What do the others think about this? _ Find and compare great deals on Broadband access at the MSN High-Speed Marketplace. http://click.atdmt.com/AVE/go/onm00200360ave/direct/01/
Re: wget-cvs-ifmodsince.patch
Craig Sowadski [EMAIL PROTECTED] writes: 1. I can implement a patch like my original that only uses the if-modified-since when the last-modified field is excluded from the head-only request. Isn't that what your most recent patch implements? 2. Send the if-modified-since request, then get header to check for size only if the if-modified-since request sends back a 304 - not modified That means introducing an additional hop for unchanged files. This is in a way even worse because I'd expect them to be in the majority. 3. implement a separate time-stamping and file-size checking options. This might make the most sense. Or, we could completely ignore the file size issue and simply trust what If-Modified-Since tells us. I guess I'd like for time-stamping to use If-Modified-Since, period. That's what it's for, and the Last-Modified thing has always been a weird hack that somehow survived the times. What do the others think about this?
Re: wget-cvs-ifmodsince.patch
I have a few suggestions about this: 1. I can implement a patch like my original that only uses the if-modified-since when the last-modified field is excluded from the head-only request. 2. Send the if-modified-since request, then get header to check for size only if the if-modified-since request sends back a 304 - not modified 3. implement a separate time-stamping and file-size checking options. I beleive that number one would be the most efficient implementation. Number two would only save requests if the file was modified from the first request, and I think three is just a bad idea. Let me know if you would like me to work out one of these other implementations. I like the first method and only made the change because you suggested that we always use if-modified-since, and I also felt it would be a good idea. But after implementation, it seems my original idea was better. Craig Sowadski From: Craig Sowadski [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Wed, 18 Feb 2004 15:15:48 -0600 I did this to allow a check on the file size as well as the modification date. Durring the Head request, only the file size is checked. If the sizes don't match, The file is downloaded without the if-modified-since. From the testing I have done, if we receive a 304-Not modified the server sends back a content length of zero. So there is no way to compare the file sizes unless we get the header. I guess we could do the request and only check the file size if we recieve the 304, this would save one request on files that are already going to be updated because of modification time. Any other sugestions??? Craig From: Hrvoje Niksic [EMAIL PROTECTED] To: Craig Sowadski [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Wed, 18 Feb 2004 16:01:51 +0100 Thanks for the modification, I've now applied the patch to my workspace and given it some testing. There's one thing I don't quite understand. Before the patch, Wget's timestamping was based on analyzing the Last-Modified header, working like this: 1. Send a HEAD request and get the response. 2.1. If the response contains Last-Modified and it indicates that the remote file is older, tell the user that there is no need to get the file. 2.2. Otherwise, send a new GET request and download the file. The problem is that we're sending *two* requests for each new file -- a HEAD request to get the last modification time, and a GET request to actually download the file. If-Modified-Since gives us a way to get rid of the HEAD request, and of the need to parse Last-Modified. I assumed that, after your patch is installed, that Wget would do this: 1. Send a GET request with the If-Modified-Since header. 2.1. If the response is 304 Not Modified, tell the user that there is no need to get the file. 2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request. But your patch does not seem to do that. It sort of implements both strategies: 1. Send a HEAD request and get the response. 2.1. If the response contains the Last-Modified header and it indicates that the remote file is old, tell the user that there is no need to get the file. 2.2. Otherwise, send a new GET request with `If-Modified-Since'. 2.2.1. If the response is 304 Not Modified, tell the user that there is no need to get the file. 2.2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request. Did you do it this way intentionally? I mean, it doesn't *break* anything, but it causes HTTP timestamping to be implemented in two different ways and it doesn't implement the improvement expected from using If-Modified-Since. Do you agree that it would be a good idea to only use If-Modified-Since? _ Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! http://click.atdmt.com/AVE/go/onm0020036 _ Stay informed on Election 2004 and the race to Super Tuesday. http://special.msn.com/msn/election2004.armx
Re: wget-cvs-ifmodsince.patch
Thanks for the modification, I've now applied the patch to my workspace and given it some testing. There's one thing I don't quite understand. Before the patch, Wget's timestamping was based on analyzing the Last-Modified header, working like this: 1. Send a HEAD request and get the response. 2.1. If the response contains Last-Modified and it indicates that the remote file is older, tell the user that there is no need to get the file. 2.2. Otherwise, send a new GET request and download the file. The problem is that we're sending *two* requests for each new file -- a HEAD request to get the last modification time, and a GET request to actually download the file. If-Modified-Since gives us a way to get rid of the HEAD request, and of the need to parse Last-Modified. I assumed that, after your patch is installed, that Wget would do this: 1. Send a GET request with the If-Modified-Since header. 2.1. If the response is 304 Not Modified, tell the user that there is no need to get the file. 2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request. But your patch does not seem to do that. It sort of implements both strategies: 1. Send a HEAD request and get the response. 2.1. If the response contains the Last-Modified header and it indicates that the remote file is old, tell the user that there is no need to get the file. 2.2. Otherwise, send a new GET request with `If-Modified-Since'. 2.2.1. If the response is 304 Not Modified, tell the user that there is no need to get the file. 2.2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request. Did you do it this way intentionally? I mean, it doesn't *break* anything, but it causes HTTP timestamping to be implemented in two different ways and it doesn't implement the improvement expected from using If-Modified-Since. Do you agree that it would be a good idea to only use If-Modified-Since?
Re: wget-cvs-ifmodsince.patch
I did this to allow a check on the file size as well as the modification date. Durring the Head request, only the file size is checked. If the sizes don't match, The file is downloaded without the if-modified-since. From the testing I have done, if we receive a 304-Not modified the server sends back a content length of zero. So there is no way to compare the file sizes unless we get the header. I guess we could do the request and only check the file size if we recieve the 304, this would save one request on files that are already going to be updated because of modification time. Any other sugestions??? Craig From: Hrvoje Niksic [EMAIL PROTECTED] To: Craig Sowadski [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Wed, 18 Feb 2004 16:01:51 +0100 Thanks for the modification, I've now applied the patch to my workspace and given it some testing. There's one thing I don't quite understand. Before the patch, Wget's timestamping was based on analyzing the Last-Modified header, working like this: 1. Send a HEAD request and get the response. 2.1. If the response contains Last-Modified and it indicates that the remote file is older, tell the user that there is no need to get the file. 2.2. Otherwise, send a new GET request and download the file. The problem is that we're sending *two* requests for each new file -- a HEAD request to get the last modification time, and a GET request to actually download the file. If-Modified-Since gives us a way to get rid of the HEAD request, and of the need to parse Last-Modified. I assumed that, after your patch is installed, that Wget would do this: 1. Send a GET request with the If-Modified-Since header. 2.1. If the response is 304 Not Modified, tell the user that there is no need to get the file. 2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request. But your patch does not seem to do that. It sort of implements both strategies: 1. Send a HEAD request and get the response. 2.1. If the response contains the Last-Modified header and it indicates that the remote file is old, tell the user that there is no need to get the file. 2.2. Otherwise, send a new GET request with `If-Modified-Since'. 2.2.1. If the response is 304 Not Modified, tell the user that there is no need to get the file. 2.2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request. Did you do it this way intentionally? I mean, it doesn't *break* anything, but it causes HTTP timestamping to be implemented in two different ways and it doesn't implement the improvement expected from using If-Modified-Since. Do you agree that it would be a good idea to only use If-Modified-Since? _ Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! http://click.atdmt.com/AVE/go/onm00200361ave/direct/01/
Re: wget-cvs-ifmodsince.patch
Craig Sowadski [EMAIL PROTECTED] writes: Ok, I have attached a new patch that moves the local time into http_stat. I am also sending this to [EMAIL PROTECTED] for others to try out. It seems to work great for me. I was about to apply this patch, but noted a small problem: + if (opt.timestamping request_method (req) == GET hs-tml != -1) You shouldn't compare C strings with `=='. This kind of code will work for the compilers that compact string literals, but that is not guaranteed to happen. The change is trivial, just use strcmp instead.
Re: wget-cvs-ifmodsince.patch
OK,, here is updated Craig From: Hrvoje Niksic [EMAIL PROTECTED] To: Craig Sowadski [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Tue, 17 Feb 2004 16:34:32 +0100 Craig Sowadski [EMAIL PROTECTED] writes: Ok, I have attached a new patch that moves the local time into http_stat. I am also sending this to [EMAIL PROTECTED] for others to try out. It seems to work great for me. I was about to apply this patch, but noted a small problem: + if (opt.timestamping request_method (req) == GET hs-tml != -1) You shouldn't compare C strings with `=='. This kind of code will work for the compilers that compact string literals, but that is not guaranteed to happen. The change is trivial, just use strcmp instead. _ Watch high-quality video with fast playback at MSN Video. Free! http://click.atdmt.com/AVE/go/onm00200365ave/direct/01/ wget-cvs-ifmodsince.patch Description: Binary data
Re: wget-cvs-ifmodsince.patch
I just noticed one other problem,, tml is never initialized after I moved to http_struct. This update takes care of it. Craig From: Craig Sowadski [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Tue, 17 Feb 2004 13:28:04 -0600 OK,, here is updated Craig From: Hrvoje Niksic [EMAIL PROTECTED] To: Craig Sowadski [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Tue, 17 Feb 2004 16:34:32 +0100 Craig Sowadski [EMAIL PROTECTED] writes: Ok, I have attached a new patch that moves the local time into http_stat. I am also sending this to [EMAIL PROTECTED] for others to try out. It seems to work great for me. I was about to apply this patch, but noted a small problem: + if (opt.timestamping request_method (req) == GET hs-tml != -1) You shouldn't compare C strings with `=='. This kind of code will work for the compilers that compact string literals, but that is not guaranteed to happen. The change is trivial, just use strcmp instead. _ Watch high-quality video with fast playback at MSN Video. Free! http://click.atdmt.com/AVE/go/onm00200365ave/direct/01/ wget-cvs-ifmodsince.patch _ Say good-bye to spam, viruses and pop-ups with MSN Premium -- free trial offer! http://click.atdmt.com/AVE/go/onm00200359ave/direct/01/ wget-cvs-ifmodsince.patch Description: Binary data
Re: wget-cvs-ifmodsince.patch
Ok, I have attached a new patch that moves the local time into http_stat. I am also sending this to [EMAIL PROTECTED] for others to try out. It seems to work great for me. wget-cvs-ifmodsince.patch ChangeLog: Craig Sowadski [EMAIL PROTECTED] * http.c (If-Modified-Since): Implemented use of 'If-Modified-Since' header instead of checking 'Last-Modified' durring the head-only request. Description: This patch modifies the time-stamping method by only comparing local and remote file sizes, and then using the 'If-Modified-Since' header durring the request. Craig Sowadski [EMAIL PROTECTED] From: Hrvoje Niksic [EMAIL PROTECTED] To: Craig Sowadski [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Thu, 12 Feb 2004 19:01:06 +0100 The patch looks good, thanks. You might want to put the local time to `struct http_stat' (where other details lie), so that the number of arguments to gethttp doesn't multiply. Would you agree to post the patch to the list at [EMAIL PROTECTED], so that other people can try it out? _ Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! http://click.atdmt.com/AVE/go/onm00200361ave/direct/01/ wget-cvs-ifmodsince.patch Description: Binary data