On 9/15/06, Mauro Tortonesi <[EMAIL PROTECTED]> wrote:
reliable detection of changes in the resource to be downloaded would be
a very interesting feature. but do you really think that checking the
last X (< 100) bytes would be enough to be reasonably sure the resource
was (not) modified? what about resources which are updated by appending
information, such as log files?

In terms of corruption prevention, wget -c is safe if the resources
are updated only by appending.

Two weaknesses I can think of are logs with fixed width repetitive
messages, e.g.

 12:05 Disks not mirrored
 12:10 Disks not mirrored

Then if we did a wget -c on the new log  file

 11:40 Disks not mirrored
 11:45 Disks not mirrored
 11:50 Disks not mirrored

we would get an invalid log file. However I imagine most log files
have at least a few variable length messages, so this technique would
work on a majority of log files (well over 50%).

Another weakness would be uncompressed database files...

However I suspect that comparing the last 4 bytes would catch 90% of
the real world snafus. I can't verify this without doing a survey of
wget users, but I can say that this would have caught 100% of my own
snafus.

There are two problems common enough to be mentioned in the man page,
proxies that append "transfer interrupted" to the end of failed
downloads and inappropriate use of "wget -c -r".  Checking the last 4
bytes would catch ~100% of cases of "transfer interrupted" being
appended. If wget acts recursively on a directory (wget -c -r) there
are many more opportunities for corruption to be detected.

--
John C. McCabe-Dansted
PhD Student
University of Western Australia

Reply via email to