1. I can implement a patch like my original that only uses the if-modified-since when the last-modified field is excluded from the head-only request.
2. Send the if-modified-since request, then get header to check for size only if the if-modified-since request sends back a "304 - not modified"
3. implement a separate time-stamping and file-size checking options.
I beleive that number one would be the most efficient implementation. Number two would only save requests if the file was modified from the first request, and I think three is just a bad idea.
Let me know if you would like me to work out one of these other implementations. I like the first method and only made the change because you suggested that we always use if-modified-since, and I also felt it would be a good idea. But after implementation, it seems my original idea was better.
Craig Sowadski
From: "Craig Sowadski" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Wed, 18 Feb 2004 15:15:48 -0600
I did this to allow a check on the file size as well as the modification date. Durring the Head request, only the file size is checked. If the sizes don't match, The file is downloaded without the if-modified-since. From the testing I have done, if we receive a 304-Not modified the server sends back a content length of zero. So there is no way to compare the file sizes unless we get the header. I guess we could do the request and only check the file size if we recieve the 304, this would save one request on files that are already going to be updated because of modification time.
Any other sugestions???
Craig
From: Hrvoje Niksic <[EMAIL PROTECTED]> To: "Craig Sowadski" <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: wget-cvs-ifmodsince.patch Date: Wed, 18 Feb 2004 16:01:51 +0100
Thanks for the modification, I've now applied the patch to my workspace and given it some testing. There's one thing I don't quite understand. Before the patch, Wget's timestamping was based on analyzing the Last-Modified header, working like this:
1. Send a HEAD request and get the response.
2.1. If the response contains Last-Modified and it indicates that the remote file is older, tell the user that there is no need to get the file.
2.2. Otherwise, send a new GET request and download the file.
The problem is that we're sending *two* requests for each new file -- a HEAD request to get the last modification time, and a GET request to actually download the file. If-Modified-Since gives us a way to get rid of the HEAD request, and of the need to parse Last-Modified. I assumed that, after your patch is installed, that Wget would do this:
1. Send a GET request with the If-Modified-Since header.
2.1. If the response is "304 Not Modified", tell the user that there is no need to get the file.
2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request.
But your patch does not seem to do that. It sort of implements both strategies:
1. Send a HEAD request and get the response.
2.1. If the response contains the Last-Modified header and it indicates that the remote file is old, tell the user that there is no need to get the file.
2.2. Otherwise, send a new GET request with `If-Modified-Since'.
2.2.1. If the response is "304 Not Modified", tell the user that there is no need to get the file.
2.2.2. If the response is something other than 304, start downloading the file immediately, without firing up a new request.
Did you do it this way intentionally? I mean, it doesn't *break* anything, but it causes HTTP timestamping to be implemented in two different ways and it doesn't implement the improvement expected from using If-Modified-Since.
Do you agree that it would be a good idea to only use If-Modified-Since?
_________________________________________________________________
Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! http://click.atdmt.com/AVE/go/onm0020036
_________________________________________________________________
Stay informed on Election 2004 and the race to Super Tuesday. http://special.msn.com/msn/election2004.armx