On 9/15/06, Mauro Tortonesi <[EMAIL PROTECTED]> wrote:
reliable detection of changes in the resource to be downloaded would be a very interesting feature. but do you really think that checking the last X (< 100) bytes would be enough to be reasonably sure the resource was (not) modified? what about resources which are updated by appending information, such as log files?
In terms of corruption prevention, wget -c is safe if the resources are updated only by appending. Two weaknesses I can think of are logs with fixed width repetitive messages, e.g. 12:05 Disks not mirrored 12:10 Disks not mirrored Then if we did a wget -c on the new log file 11:40 Disks not mirrored 11:45 Disks not mirrored 11:50 Disks not mirrored we would get an invalid log file. However I imagine most log files have at least a few variable length messages, so this technique would work on a majority of log files (well over 50%). Another weakness would be uncompressed database files... However I suspect that comparing the last 4 bytes would catch 90% of the real world snafus. I can't verify this without doing a survey of wget users, but I can say that this would have caught 100% of my own snafus. There are two problems common enough to be mentioned in the man page, proxies that append "transfer interrupted" to the end of failed downloads and inappropriate use of "wget -c -r". Checking the last 4 bytes would catch ~100% of cases of "transfer interrupted" being appended. If wget acts recursively on a directory (wget -c -r) there are many more opportunities for corruption to be detected. -- John C. McCabe-Dansted PhD Student University of Western Australia