Re: wget bug with ftp/passive
On Wed, 21 Jan 2004 23:07:30 -0800, you wrote: >Hello, >I think I've come across a little bug in wget when using it to get a file >via ftp. > >I did not specify the "passive" option, yet it appears to have been used >anyway Here's a short transcript: Passive FTP can be specified in /etc/wgetrc or /usr/local/etc/wgetrc, and then its impossible to turn it off. There is no --active-mode flag as far as I can tell. I submitted a patch to wget-patches under the title of "Patch to add --active-ftp and make --passive-ftp default", which does what it says. Your configuration is setting passive mode to default, but the stock wget defaults to active (active mode doesn't work too well behind some firewalls). --active-ftp is a very useful option in these cases. Last I checked, the patch hasn't been committed. I can't find the wget-patches mail archives anywhere, either. So I'll paste it here, in hopes that it helps. -Jeff Connelly =cut here= Common subdirectories: doc.orig/ChangeLog-branches and doc/ChangeLog-branches diff -u doc.orig/wget.pod doc/wget.pod --- doc.orig/wget.pod Wed Jul 21 20:17:29 2004 +++ doc/wget.podWed Jul 21 20:18:56 2004 @@ -888,12 +888,17 @@ system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix C output). +=item B<--active-ftp> + +Use the I FTP retrieval scehme, in which the server +initiates the data connection. This is sometimes required to connect +to FTP servers that are behind firewalls. =item B<--passive-ftp> Use the I FTP retrieval scheme, in which the client initiates the data connection. This is sometimes required for FTP -to work behind firewalls. +to work behind firewalls, and as such is enabled by default. =item B<--retr-symlinks> Common subdirectories: src.orig/.libs and src/.libs Common subdirectories: src.orig/ChangeLog-branches and src/ChangeLog-branches diff -u src.orig/init.c src/init.c --- src.orig/init.c Wed Jul 21 20:17:33 2004 +++ src/init.c Wed Jul 21 20:17:59 2004 @@ -255,6 +255,7 @@ opt.ftp_glob = 1; opt.htmlify = 1; opt.http_keep_alive = 1; + opt.ftp_pasv = 1; opt.use_proxy = 1; tmp = getenv ("no_proxy"); if (tmp) diff -u src.orig/main.c src/main.c --- src.orig/main.c Wed Jul 21 20:17:33 2004 +++ src/main.c Wed Jul 21 20:17:59 2004 @@ -217,7 +217,8 @@ FTP options:\n\ -nr, --dont-remove-listing don\'t remove `.listing\' files.\n\ -g, --glob=on/off turn file name globbing on or off.\n\ - --passive-ftp use the \"passive\" transfer mode.\n\ + --passive-ftp use the \"passive\" transfer mode (default).\n\ + --active-ftpuse the \"active\" transfer mode.\n\ --retr-symlinks when recursing, get linked-to files (not dirs).\ n\ \n"), stdout); fputs (_("\ @@ -285,6 +286,7 @@ { "no-parent", no_argument, NULL, 133 }, { "non-verbose", no_argument, NULL, 146 }, { "passive-ftp", no_argument, NULL, 139 }, +{ "active-ftp", no_argument, NULL, 167 }, { "page-requisites", no_argument, NULL, 'p' }, { "quiet", no_argument, NULL, 'q' }, { "random-wait", no_argument, NULL, 165 }, @@ -397,6 +399,9 @@ case 139: setval ("passiveftp", "on"); break; +case 167: + setval ("passiveftp", "off"); + break; case 141: setval ("noclobber", "on"); break;
trying to wget all of a remote page nearly works
Hi, I'm trying to do something which seems really simple, but can't get it to work for me (actually 2 approaches) -- and I'd appreciate a word via email reply if what I'm doing is sane and/or possible w/ wget: approach #1: ideally, I'd like to pull all of a web site, say http://www.cnn.com, save its root doc to some filename like cnn_com.html with all the child images in a dir: cnn_com_images. i.e. I'd like to save a copy of cnn.com, w/ images, so that it can be browsed later w/o a network connection. I prefer a flat directory structure but understand the need for wget's directory-per-host organization. but I haven't been able to see how to extract the root doc. approach #2: instead I'm running wget inside one subdir for the whole doc: # mkdir cnn_com # cd cnn_com # wget -p -E -H -k -nd -nH -d -o wget.run http://www.cnn.com the images are pulled and placed in the current dir, and _nearly_ all the links are fixed up in index.html to point to the current subdir except for: http://i.cnn.net/cnn/images/1.gif"; ...> [ 1.gif _is_ pulled ] the debug output is: . . . index.html: merge("http://www.cnn.com/";, "#ContentArea") -> http://www.cnn.com/#ContentArea appending "http://www.cnn.com/"; to urlpos. index.html: merge("http://www.cnn.com/";, "http://i.cnn.net/cnn/images/1.gif";) -> http://i.cnn.net/cnn/images/1.gif appending "http://i.cnn.net/cnn/images/1.gif"; to urlpos. index.html: merge("http://www.cnn.com/";, "http://www.cnn.com/";) -> http://www.cnn.com/ appending "http://www.cnn.com/"; to urlpos. . . . IF I omit options -nd -nH then all references in the root doc are fixed up to point to the local copy ... but I lose my flat dir structure. I would appreciate a sanity check on what I'm trying to do -- essentially create static test cases for browser development ... Can I do what I want to do w/ wget ? All replies greatly appreciated - please reply by email to [EMAIL PROTECTED] as I'm not subscribed to the wget mailing list. thanks, Mark
Re: get link Internal Server Error
For me this link does NOT work in IE 6.0 latest Mozilla latest Opera So I tested a bit further. If you go to the site and reach http://www.interwetten.com/webclient/start.html and then use the URL you provide, it works. A quick check for stored cookies revealed that two cookies are stored. So you have to use wget with cookies. For info on how to do that, use the manual. CU Jens > hi all: > some link use IE open is normal,but use wget download have > somewrong, i cant find some slove way, i think it maybe a bug : > example link: > > http://www.interwetten.com/webclient/betting/offer.aspx?type=1&kindofsportid=10&L=EN > this link use IE open is ok,but use wget have this wrong, > Connecting to www.interwetten.com[213.185.178.21]:80... connected. > HTTP request sent, awaiting response... 500 Internal Server Error > 01:02:27 ERROR 500: Internal Server Error. > henryluo > -- NEU: WLAN-Router für 0,- EUR* - auch für DSL-Wechsler! GMX DSL = supergünstig & kabellos http://www.gmx.net/de/go/dsl
ftp download "ignoring" length
Hi all, I have a shell script that everyday uses wget (version installed is 1.5.3) to get via FTP a quite big (around 100 Mb) .zip file. The line command is #wget -o log.txt ftp://. The problem is that sometimes in the log file I read "Length: 106,670,952 (unauthoritative)" that's correct, but at the end of log it says " 22650K .. .. .. .. .. 21% 429.91 KB/s 02:51:22 (384.46 KB/s) - `pdf20040812.zip' saved [23240704]" Why this may happen? Maybe a bug of the (old) version installed? Or a misbehavior of the FTP server? Is it possible to force wget to behave as in http download where the content length is checked to validate the downloaded file? Man pages seem to say no. If I use the "-c" option ('wget -c -o log.txt ftp://...' repeated many times it would finally download the correct file) it's not as good to solve this problem, because I would have to ask the author to modify the shell script (now it waits for wget's end to start processing the file, so it would work on the corrupted file if he don't add a check that reads the logfile searching for "100%"). -- Alessandro Tinivelli