Re: wget-cvs-ifmodsince.patch

2004-03-18 Thread Craig Sowadski
It's been awhile since I have been able to look into this, but here is a 
method I think I am going to tryout:

1. Send a Head-only request with a if-unmodified-since (local file date).

2.1 If we recieve the Header, check the file size and download if different.
2.2 if we recieve 412 (Precondition Failed), we send the Get request the 
recieve the file.



I think this is the most efficient way to perform timestamping as described 
in the documentation ( and still make use of the http headers made for this 
purpose). This way we make one request for most files, and two for files 
that need to be downloaded.

Also I saw an email requesting we create an option to set all downloaded 
files to the current time. I will also work on implementing this with my 
patch if youwould like..

 Craig Sowadski

From: Hrvoje Niksic [EMAIL PROTECTED]
To: Craig Sowadski [EMAIL PROTECTED]
CC: [EMAIL PROTECTED],  [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Sat, 28 Feb 2004 02:52:07 +0100
Craig Sowadski [EMAIL PROTECTED] writes:

 My only concern about only checking modification date is when there
 is an incomplete download, the local modification date is set to the
 current time. So when the mirror is next attempted, the file is
 marked newer than the server file and is not replaced.
Hmm, you're right -- if-modified-since utterly fails in that (not at
all uncommon) scenario.
 Is there some way to make the local modification date (time_t = 0 )
 until it is finished?? This way the incomplete file will always be
 older than the server file, and will be replaced.
I don't think there's a reliable way to do that.

So, I guess the only correct approach is in fact the one you used in
your patch:
1. Send a HEAD request and get the response.

2.1. If the response contains the Last-Modified header and it
  indicates that the remote file is old, tell the user that there
  is no need to get the file.

2.2. Otherwise, send a new GET request with `If-Modified-Since'.

2.2.1. If the response is 304 Not Modified, tell the user that there
is no need to get the file.

2.2.2. If the response is something other than 304, start downloading
the file immediately, without firing up a new request.
The remaining question is: does Wget need the added complexity only to
support servers that don't bother sending Last-Modified, but that do
support If-Modified-Since.  How frequent are such servers anyway?
_
All the action. All the drama. Get NCAA hoops coverage at MSN Sports by 
ESPN. http://msn.espn.go.com/index.html?partnersite=espn



Re: wget-cvs-ifmodsince.patch

2004-02-26 Thread Craig Sowadski
My only concern about only checking modification date is when there is an 
incomplete download, the local modification date is set to the current time. 
So when the mirror is next attempted, the file is marked newer than the 
server file and is not replaced. Is there some way to make the local 
modification date (time_t = 0 ) until it is finished?? This way the 
incomplete file will always be older than the server file, and will be 
replaced.

Craig


From: Hrvoje Niksic [EMAIL PROTECTED]
To: Craig Sowadski [EMAIL PROTECTED]
CC: [EMAIL PROTECTED],  [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Thu, 26 Feb 2004 00:44:48 +0100
Craig Sowadski [EMAIL PROTECTED] writes:

 1. I can implement a patch like my original that only uses the
 if-modified-since when the last-modified field is excluded from the
 head-only request.
Isn't that what your most recent patch implements?

 2. Send the if-modified-since request, then get header to check for
 size only if the if-modified-since request sends back a 304 - not
 modified
That means introducing an additional hop for unchanged files.  This is
in a way even worse because I'd expect them to be in the majority.
 3. implement a separate time-stamping and file-size checking
 options.
This might make the most sense.  Or, we could completely ignore the
file size issue and simply trust what If-Modified-Since tells us.
I guess I'd like for time-stamping to use If-Modified-Since, period.
That's what it's for, and the Last-Modified thing has always been a
weird hack that somehow survived the times.  What do the others think
about this?
_
Find and compare great deals on Broadband access at the MSN High-Speed 
Marketplace. http://click.atdmt.com/AVE/go/onm00200360ave/direct/01/



Re: wget-cvs-ifmodsince.patch

2004-02-25 Thread Hrvoje Niksic
Craig Sowadski [EMAIL PROTECTED] writes:

 1. I can implement a patch like my original that only uses the
 if-modified-since when the last-modified field is excluded from the
 head-only request.

Isn't that what your most recent patch implements?

 2. Send the if-modified-since request, then get header to check for
 size only if the if-modified-since request sends back a 304 - not
 modified

That means introducing an additional hop for unchanged files.  This is
in a way even worse because I'd expect them to be in the majority.

 3. implement a separate time-stamping and file-size checking
 options.

This might make the most sense.  Or, we could completely ignore the
file size issue and simply trust what If-Modified-Since tells us.

I guess I'd like for time-stamping to use If-Modified-Since, period.
That's what it's for, and the Last-Modified thing has always been a
weird hack that somehow survived the times.  What do the others think
about this?



Re: wget-cvs-ifmodsince.patch

2004-02-24 Thread Craig Sowadski
I have a few suggestions about this:

1. I can implement a patch like my original that only uses the 
if-modified-since when the last-modified field is excluded from the 
head-only request.

2. Send the if-modified-since request, then get header to check for size 
only if the if-modified-since request sends back a 304 - not modified

3. implement a separate time-stamping and file-size checking options.

I beleive that number one would be the most efficient implementation. Number 
two would only save requests if the file was modified from the first 
request, and I think three is just a bad idea.

Let me know if you would like me to work out one of these other 
implementations. I like the first method and only made the change because 
you suggested that we always use if-modified-since, and I also felt it would 
be a good idea. But after implementation, it seems my original idea was 
better.

Craig Sowadski



From: Craig Sowadski [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Wed, 18 Feb 2004 15:15:48 -0600
I did this to allow a check on the file size as well as the modification 
date. Durring the Head request, only the file size is checked. If the sizes 
don't match, The file is downloaded without the if-modified-since. From the 
testing I have done, if we receive a 304-Not modified the server sends back 
a content length of zero. So there is no way to compare the file sizes 
unless we get the header. I guess we could do the request and only check 
the file size if we recieve the 304, this would save one request on files 
that are already going to be updated because of modification time.

Any other sugestions???

  Craig


From: Hrvoje Niksic [EMAIL PROTECTED]
To: Craig Sowadski [EMAIL PROTECTED]
CC: [EMAIL PROTECTED],  [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Wed, 18 Feb 2004 16:01:51 +0100
Thanks for the modification, I've now applied the patch to my
workspace and given it some testing.  There's one thing I don't quite
understand.  Before the patch, Wget's timestamping was based on
analyzing the Last-Modified header, working like this:
1. Send a HEAD request and get the response.

2.1. If the response contains Last-Modified and it indicates that the
 remote file is older, tell the user that there is no need to get
 the file.
2.2. Otherwise, send a new GET request and download the file.

The problem is that we're sending *two* requests for each new file --
a HEAD request to get the last modification time, and a GET request to
actually download the file.  If-Modified-Since gives us a way to get
rid of the HEAD request, and of the need to parse Last-Modified.  I
assumed that, after your patch is installed, that Wget would do this:
1. Send a GET request with the If-Modified-Since header.

2.1. If the response is 304 Not Modified, tell the user that there
 is no need to get the file.
2.2. If the response is something other than 304, start downloading
 the file immediately, without firing up a new request.
But your patch does not seem to do that.  It sort of implements both
strategies:
1. Send a HEAD request and get the response.

2.1. If the response contains the Last-Modified header and it
 indicates that the remote file is old, tell the user that there
 is no need to get the file.
2.2. Otherwise, send a new GET request with `If-Modified-Since'.

2.2.1. If the response is 304 Not Modified, tell the user that there
   is no need to get the file.
2.2.2. If the response is something other than 304, start downloading
   the file immediately, without firing up a new request.
Did you do it this way intentionally?  I mean, it doesn't *break*
anything, but it causes HTTP timestamping to be implemented in two
different ways and it doesn't implement the improvement expected from
using If-Modified-Since.
Do you agree that it would be a good idea to only use
If-Modified-Since?
_
Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! 
http://click.atdmt.com/AVE/go/onm0020036
_
Stay informed on Election 2004 and the race to Super Tuesday. 
http://special.msn.com/msn/election2004.armx



Re: wget-cvs-ifmodsince.patch

2004-02-18 Thread Hrvoje Niksic
Thanks for the modification, I've now applied the patch to my
workspace and given it some testing.  There's one thing I don't quite
understand.  Before the patch, Wget's timestamping was based on
analyzing the Last-Modified header, working like this:

1. Send a HEAD request and get the response.

2.1. If the response contains Last-Modified and it indicates that the
 remote file is older, tell the user that there is no need to get
 the file.

2.2. Otherwise, send a new GET request and download the file.

The problem is that we're sending *two* requests for each new file --
a HEAD request to get the last modification time, and a GET request to
actually download the file.  If-Modified-Since gives us a way to get
rid of the HEAD request, and of the need to parse Last-Modified.  I
assumed that, after your patch is installed, that Wget would do this:

1. Send a GET request with the If-Modified-Since header.

2.1. If the response is 304 Not Modified, tell the user that there
 is no need to get the file.

2.2. If the response is something other than 304, start downloading
 the file immediately, without firing up a new request.

But your patch does not seem to do that.  It sort of implements both
strategies:

1. Send a HEAD request and get the response.

2.1. If the response contains the Last-Modified header and it
 indicates that the remote file is old, tell the user that there
 is no need to get the file.

2.2. Otherwise, send a new GET request with `If-Modified-Since'.

2.2.1. If the response is 304 Not Modified, tell the user that there
   is no need to get the file.

2.2.2. If the response is something other than 304, start downloading
   the file immediately, without firing up a new request.

Did you do it this way intentionally?  I mean, it doesn't *break*
anything, but it causes HTTP timestamping to be implemented in two
different ways and it doesn't implement the improvement expected from
using If-Modified-Since.

Do you agree that it would be a good idea to only use
If-Modified-Since?



Re: wget-cvs-ifmodsince.patch

2004-02-18 Thread Craig Sowadski
I did this to allow a check on the file size as well as the modification 
date. Durring the Head request, only the file size is checked. If the sizes 
don't match, The file is downloaded without the if-modified-since. From the 
testing I have done, if we receive a 304-Not modified the server sends back 
a content length of zero. So there is no way to compare the file sizes 
unless we get the header. I guess we could do the request and only check the 
file size if we recieve the 304, this would save one request on files that 
are already going to be updated because of modification time.

Any other sugestions???

  Craig


From: Hrvoje Niksic [EMAIL PROTECTED]
To: Craig Sowadski [EMAIL PROTECTED]
CC: [EMAIL PROTECTED],  [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Wed, 18 Feb 2004 16:01:51 +0100
Thanks for the modification, I've now applied the patch to my
workspace and given it some testing.  There's one thing I don't quite
understand.  Before the patch, Wget's timestamping was based on
analyzing the Last-Modified header, working like this:
1. Send a HEAD request and get the response.

2.1. If the response contains Last-Modified and it indicates that the
 remote file is older, tell the user that there is no need to get
 the file.
2.2. Otherwise, send a new GET request and download the file.

The problem is that we're sending *two* requests for each new file --
a HEAD request to get the last modification time, and a GET request to
actually download the file.  If-Modified-Since gives us a way to get
rid of the HEAD request, and of the need to parse Last-Modified.  I
assumed that, after your patch is installed, that Wget would do this:
1. Send a GET request with the If-Modified-Since header.

2.1. If the response is 304 Not Modified, tell the user that there
 is no need to get the file.
2.2. If the response is something other than 304, start downloading
 the file immediately, without firing up a new request.
But your patch does not seem to do that.  It sort of implements both
strategies:
1. Send a HEAD request and get the response.

2.1. If the response contains the Last-Modified header and it
 indicates that the remote file is old, tell the user that there
 is no need to get the file.
2.2. Otherwise, send a new GET request with `If-Modified-Since'.

2.2.1. If the response is 304 Not Modified, tell the user that there
   is no need to get the file.
2.2.2. If the response is something other than 304, start downloading
   the file immediately, without firing up a new request.
Did you do it this way intentionally?  I mean, it doesn't *break*
anything, but it causes HTTP timestamping to be implemented in two
different ways and it doesn't implement the improvement expected from
using If-Modified-Since.
Do you agree that it would be a good idea to only use
If-Modified-Since?
_
Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! 
http://click.atdmt.com/AVE/go/onm00200361ave/direct/01/



Re: wget-cvs-ifmodsince.patch

2004-02-17 Thread Hrvoje Niksic
Craig Sowadski [EMAIL PROTECTED] writes:

 Ok, I have attached a new patch that moves the local time into
 http_stat. I am also sending this to [EMAIL PROTECTED] for others to
 try out. It seems to work great for me.

I was about to apply this patch, but noted a small problem:

 +  if (opt.timestamping  request_method (req) == GET  hs-tml != -1)

You shouldn't compare C strings with `=='.  This kind of code will
work for the compilers that compact string literals, but that is not
guaranteed to happen.

The change is trivial, just use strcmp instead.


Re: wget-cvs-ifmodsince.patch

2004-02-17 Thread Craig Sowadski
OK,, here is updated

 Craig


From: Hrvoje Niksic [EMAIL PROTECTED]
To: Craig Sowadski [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Tue, 17 Feb 2004 16:34:32 +0100
Craig Sowadski [EMAIL PROTECTED] writes:

 Ok, I have attached a new patch that moves the local time into
 http_stat. I am also sending this to [EMAIL PROTECTED] for others to
 try out. It seems to work great for me.
I was about to apply this patch, but noted a small problem:

 +  if (opt.timestamping  request_method (req) == GET  hs-tml != 
-1)

You shouldn't compare C strings with `=='.  This kind of code will
work for the compilers that compact string literals, but that is not
guaranteed to happen.
The change is trivial, just use strcmp instead.
_
Watch high-quality video with fast playback at MSN Video. Free! 
http://click.atdmt.com/AVE/go/onm00200365ave/direct/01/


wget-cvs-ifmodsince.patch
Description: Binary data


Re: wget-cvs-ifmodsince.patch

2004-02-17 Thread Craig Sowadski
I just noticed one other problem,, tml is never initialized after I moved to 
http_struct. This update takes care of it.
  Craig




From: Craig Sowadski [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Tue, 17 Feb 2004 13:28:04 -0600
OK,, here is updated

 Craig


From: Hrvoje Niksic [EMAIL PROTECTED]
To: Craig Sowadski [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Tue, 17 Feb 2004 16:34:32 +0100
Craig Sowadski [EMAIL PROTECTED] writes:

 Ok, I have attached a new patch that moves the local time into
 http_stat. I am also sending this to [EMAIL PROTECTED] for others to
 try out. It seems to work great for me.
I was about to apply this patch, but noted a small problem:

 +  if (opt.timestamping  request_method (req) == GET  hs-tml != 
-1)

You shouldn't compare C strings with `=='.  This kind of code will
work for the compilers that compact string literals, but that is not
guaranteed to happen.
The change is trivial, just use strcmp instead.
_
Watch high-quality video with fast playback at MSN Video. Free! 
http://click.atdmt.com/AVE/go/onm00200365ave/direct/01/
 wget-cvs-ifmodsince.patch 
_
Say “good-bye” to spam, viruses and pop-ups with MSN Premium -- free trial 
offer! http://click.atdmt.com/AVE/go/onm00200359ave/direct/01/


wget-cvs-ifmodsince.patch
Description: Binary data


Re: wget-cvs-ifmodsince.patch

2004-02-16 Thread Craig Sowadski
Ok, I have attached a new patch that moves the local time into http_stat. I 
am also sending this to [EMAIL PROTECTED] for others to try out. It seems to 
work great for me.

wget-cvs-ifmodsince.patch

ChangeLog:  Craig Sowadski [EMAIL PROTECTED]

* http.c (If-Modified-Since): Implemented use of
'If-Modified-Since' header instead of checking
'Last-Modified' durring the head-only request.
Description:
   This patch modifies the time-stamping method by only
   comparing local and remote file sizes, and then using
   the 'If-Modified-Since' header durring the request.
Craig Sowadski [EMAIL PROTECTED]
From: Hrvoje Niksic [EMAIL PROTECTED]
To: Craig Sowadski [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: wget-cvs-ifmodsince.patch
Date: Thu, 12 Feb 2004 19:01:06 +0100
The patch looks good, thanks.  You might want to put the local time to
`struct http_stat' (where other details lie), so that the number of
arguments to gethttp doesn't multiply.
Would you agree to post the patch to the list at [EMAIL PROTECTED], so
that other people can try it out?
_
Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! 
http://click.atdmt.com/AVE/go/onm00200361ave/direct/01/


wget-cvs-ifmodsince.patch
Description: Binary data