Re: Feature suggestion: change detection for "wget -c"
On 9/15/06, Mauro Tortonesi <[EMAIL PROTECTED]> wrote: reliable detection of changes in the resource to be downloaded would be a very interesting feature. but do you really think that checking the last X (< 100) bytes would be enough to be reasonably sure the resource was (not) modified? what about resources which are updated by appending information, such as log files? In terms of corruption prevention, wget -c is safe if the resources are updated only by appending. Two weaknesses I can think of are logs with fixed width repetitive messages, e.g. 12:05 Disks not mirrored 12:10 Disks not mirrored Then if we did a wget -c on the new log file 11:40 Disks not mirrored 11:45 Disks not mirrored 11:50 Disks not mirrored we would get an invalid log file. However I imagine most log files have at least a few variable length messages, so this technique would work on a majority of log files (well over 50%). Another weakness would be uncompressed database files... However I suspect that comparing the last 4 bytes would catch 90% of the real world snafus. I can't verify this without doing a survey of wget users, but I can say that this would have caught 100% of my own snafus. There are two problems common enough to be mentioned in the man page, proxies that append "transfer interrupted" to the end of failed downloads and inappropriate use of "wget -c -r". Checking the last 4 bytes would catch ~100% of cases of "transfer interrupted" being appended. If wget acts recursively on a directory (wget -c -r) there are many more opportunities for corruption to be detected. -- John C. McCabe-Dansted PhD Student University of Western Australia
Re: wget "how do I do..."
From: Craig A. Finseth It might help to know which version of Wget you're using ("wget -V"), and on which system type you're running it. Adding "-d" to the wget command line might give you more clues as to what it's trying to do. Seeing the debug output might save considerable code tracing, as I, for example, don't have access (so far as I know) to an FTP server which acts that way. Probably useless guesswork: Does it help to add a trailing "/" to the URL ("ftp://...:...@//")? Same behavior with "-r"? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
error running tests: Can't locate object method "new" via package "HTTPTest"
hi all. is anyone successfully running the perl unit tests? i have perl 5.8.0 and libwww-perl 5.65 happily installed, but i'm getting this error: heaven:~/wget/tests> ./Test1.px Can't locate object method "new" via package "HTTPTest" at ./Test1.px line 38. the "new" method is defined in Test, which is HTTPTest's base class. evidently it finds and loads the Test and HTTPTest packages ok, it just can't locate the "new" method in either package. from "Programming Perl": Can't locate object method "%s" via package "%s" (F) You called a method correctly, and it correctly indicated a package functioning as a class, but the package doesn't define that method name, nor do any of its base classes (which is why the message says "via" rather than "in"). thoughts? -Ryan -- http://snarfed.org/
wget "how do I do..."
I am trying to mirror an FTP site which has access control in that it doesn't let you do a "dir" on the root (it returns an empty list). In other wods, if you manually do: ftp username: ... password: ... dir You get an empty list. But if you do: cd dir You get your files. Note that I've tried doing: wget ftp://...:...@/ and it still fails. It appears that wget is getting the empty listing and (not unreasonably) deciding that there is nothing there. In essence, what I want to do insert a "cd " after login but before wget tries to do anything else. Is there an existing way to do this, or do I need to modify the code? FWIW, it's a Windows server. Craig A. Finseth[EMAIL PROTECTED] Systems Architect +1 651 201 1011 desk State of Minnesota, Office of Enterprise Technology 658 Cedar Ave +1 651 297 5368 fax St Paul MN 55155+1 651 297 NOC, for reporting problems
Re: Bug
Reece ha scritto: Found a bug (sort of). When trying to get all the images in the directory below: http://www.netstate.com/states/maps/images/ It gives 403 Forbidden errors for most of the images even after setting the agent string to firefox's, and setting -e robots=off After a packet capture, it appears that the site will give the forbidden error if the Refferer is not exaclty correct. However, since wget actually uses the domain www.netstate.com:80 instead of without the port, it screws it all up. I've been unable to find any way to tell wget not to insert the port in the requesting url and referrer url. Here is the full command I was using: wget -r -l 1 -H -U "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -e robots=off -d -nh http://www.netstate.com/states/maps/images/ hi reece, that's an interesting bug. i've just added it to my "THINGS TO FIX" list. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: --html-extension and --convert-links don't work together
Ryan Barrett ha scritto: hi wget developers! nicolas mizel reported a bug with --html-extension and --convert-links about a year and a half ago. in a nutshell, --html-extension appends .html to non-html filenames, but --converted-links doesn't use the .html filenames when it converts links. http://www.mail-archive.com/wget@sunsite.dk/msg07688.html he reported it against 1.9.1, but it's still broken in 1.10.2. any chance it could be fixed in the next release? in my opinion, this is a serious bug. we should fix it ASAP. i have a lot on my plate right now, but if it'd help, i could probably whip up a patch in a few weeks or so... that would be great. thanks. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget: ignores Content-Disposition header
Jochen Roderburg ha scritto: Noèl Köthe schrieb: Hello, I can reproduce the following with 1.10.2 and 1.11.beta1: Wget ignores Content-Disposition header described in RFC 2616, 19.5.1 Content-Disposition. an example URL is: http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715;msg=5;att=1 Sorry, I don't see any Content-Disposition header in this example URL ;-) Result of a HEAD request: 200 OK Connection: close Date: Fri, 15 Sep 2006 12:58:14 GMT Server: Apache/1.3.33 (Debian GNU/Linux) Content-Type: text/html; charset=utf-8 Last-Modified: Mon, 04 Aug 2003 21:18:10 GMT Client-Date: Fri, 15 Sep 2006 12:58:14 GMT Client-Response-Num: 1 My own experience is that the 1.11 alpha/beta versions (where this feature was introduced) worked fine with the examples I encountered. Jochen is right: [EMAIL PROTECTED]:~/tmp$ LANG=C ~/code/svn/wget/src/wget -S -d http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715;msg=5;att=1 DEBUG output created by Wget 1.10+devel on linux-gnu. --16:58:52-- http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715 Resolving bugs.debian.org... 140.211.166.43 Caching bugs.debian.org => 140.211.166.43 Connecting to bugs.debian.org|140.211.166.43|:80... connected. Created socket 3. Releasing 0x00556550 (new refcount 1). ---request begin--- GET /cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: bugs.debian.org Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Date: Fri, 15 Sep 2006 14:54:55 GMT Content-Type: text/html; charset=utf-8 Server: Apache/1.3.33 (Debian GNU/Linux) Via: 1.1 proxy (NetCache NetApp/5.6.2R1) ---response end--- HTTP/1.0 200 OK Date: Fri, 15 Sep 2006 14:54:55 GMT Content-Type: text/html; charset=utf-8 Server: Apache/1.3.33 (Debian GNU/Linux) Via: 1.1 proxy (NetCache NetApp/5.6.2R1) Length: unspecified [text/html] Saving to: `%2Ftmp%2Fupdate-grub.patch?bug=168715' [<=> ] 20,018 32.6K/s in 0.6s Closed fd 3 16:58:54 (32.6 KB/s) - `%2Ftmp%2Fupdate-grub.patch?bug=168715' saved [20018] -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: help downloading site
Tate Mitchell ha scritto: Would it be possible to download each lesson individually, so that as lessons are added, or finished, I can download them w/out re-downloading the whole site? Could someone tell me how please? Or would it be possible to download the whole thing and just re-download parts that have been added since the previous download? why don't you try something like: wget -m -k -np http://www.ncsu.edu/project/hindi_lessons/Hindi.Less.01/index.html -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: one more thing.
Tate Mitchell ha scritto: If anyone could show me how to do this on the wget gui, that would be appreciated, too. http://www.jensroesner.de/wgetgui/ wget and wgetgui are releated programs, but they are developed by two different teams. you should ask this question to the wgetgui authors. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: REST - error for files bigger than 4GB
Steven M. Schweda ha scritto: Are you certain that the FTP _server_ can handle file offsets greater than 4GB in the REST command? i agree with steven here. it's very likely to be a server-side problem. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget 1.11 beta 1 released
Oliver Schulze L. ha scritto: Does this version have the conection cache code? no, not yet. i have some preliminary code for connection caching, but i am not going to finish it and merge it into the trunk before wget 1.11 is released. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: timestamp and backup
Olav Mørkrid ha scritto: hi let's say i fetch 10 files from a server with wget. then i want to download any modifications to these files. HOWEVER, if a new version of a file is downloaded, i want a backup of the old file (eg. write to .bak, or possibly .001 and .002 to keep a record of all versions of a file. can wget do this? yes. if file X is already present in your filesystem, by default wget downloads the new file and saves it as "X.1". i tried to combine -N with -nc, which would seem logical (do timestamp checking, and prevent overwriting), but wget protests that they are mutually exclusive. and if i use no options, then wget fetches a new file even though it's not updated. you should not use -nc, just -N. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Feature suggestion: change detection for "wget -c"
John McCabe-Dansted ha scritto: "Wget has no way of verifying that the local file is really a valid prefix of the remote file" Couldn't wget redownload the last 4 bytes (or so) of the file? For a few bytes per file we could detect changes to almost all compressed files and the majority of uncompressed files. reliable detection of changes in the resource to be downloaded would be a very interesting feature. but do you really think that checking the last X (< 100) bytes would be enough to be reasonably sure the resource was (not) modified? what about resources which are updated by appending information, such as log files? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: -P ignored by parse_content_disposition
Ashley Bone ha scritto: When wget determines the local filename from Content-Disposition, the -P (--directory-prefix) is ignored. The file is always downloaded to the current directory. Looking at parse_content_disposition(), I think this may be by design. Does anyone know for sure? no, it's clearly a bug. If not, I can submit a patch. yes, please do it if you can. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: wget: ignores Content-Disposition header
Noèl Köthe schrieb: Hello, I can reproduce the following with 1.10.2 and 1.11.beta1: Wget ignores Content-Disposition header described in RFC 2616, 19.5.1 Content-Disposition. an example URL is: http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715;msg=5;att=1 Sorry, I don't see any Content-Disposition header in this example URL ;-) Result of a HEAD request: 200 OK Connection: close Date: Fri, 15 Sep 2006 12:58:14 GMT Server: Apache/1.3.33 (Debian GNU/Linux) Content-Type: text/html; charset=utf-8 Last-Modified: Mon, 04 Aug 2003 21:18:10 GMT Client-Date: Fri, 15 Sep 2006 12:58:14 GMT Client-Response-Num: 1 My own experience is that the 1.11 alpha/beta versions (where this feature was introduced) worked fine with the examples I encountered. Best regards, Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10 Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
wget: ignores Content-Disposition header
Hello, I can reproduce the following with 1.10.2 and 1.11.beta1: Wget ignores Content-Disposition header described in RFC 2616, 19.5.1 Content-Disposition. an example URL is: http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715;msg=5;att=1 -- Noèl Köthe Debian GNU/Linux, www.debian.org signature.asc Description: Dies ist ein digital signierter Nachrichtenteil