Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah, Many thanks with all your very timely help. I have had no issues since following you instructions to upgrade to 1.11.4 and installing it in the /opt directory. I used: $ ./configure --prefix=/opt/wget And point to ist specifically: /opt/wget/bin/wget --tries=10 -r -N -l inf --wait=1\ -nH --cut-dirs=2 ftp://oceans.gsfc.nasa.gov/MODISA/ATTEPH/ \ -o /home1/software/modis/atteph/mirror_a.log \ --directory-prefix=/home1/software/modis/atteph Thanks again. Brock On Monday 27 October 2008 3:06 pm, Micah Cowan wrote: Brock Murch wrote: Sorry, 1 quick question? Do you know of anyone providing rpm's of 1.11.4 for CentOS? Not offhand. It may not yet be available; it was only packaged for Fedora Core a couple months ago, I think. RPMfind.net just lists 1.11.4 sources for fc9 and fc10. If not, would you recommend uninstalling the current one? Before installing from your src? Many thanks. I'd advise against that: I believe various important components of Red Hat/CentOS rely on wget to fetch things. Sometimes minor changes in the output/interface of wget cause problems for automated scripts that form an integral part of an operating system. Though really, I think most of the changes that would pose such a danger are actually already in the Red Hat modified 1.10.2 sources (taken from the development sources for what was later released as 1.11). What I tend to do on my systems, is to configure the sources like: $ ./configure --prefix=$HOME/opt/wget and then either add $HOME/opt to my $PATH, or invoke it directly as $HOME/opt/wget/bin/wget. Note that if you want to build wget with support for HTTPS, you'll need to have the development package for openssl installed. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFJDwveMAkzD2qY/pURAmvuAJ9XG784Djq0mwcTu/nN56tPSM+AMQCgm2KX dzPQ263FF7Gaw4qtE1X0wTI= =CC9T -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brock Murch wrote: I try to keep a mirror of NASA atteph ancilliary data for modis processing. I know that means little, but I have a cron script that runs 2 times a day. Sometimes it works, and others, not so much. The sh script is listed at the end of this email below. As is the contents of the remote ftp server's root and portions fo the log. I don't need all the data on the remote server, only some thus I use --cut-dirs.To make matters stranger, the software (also from NASA) that uses these files, looks for them in a single place on the client machine where the software runs, but needs data from 2 different directories on the remote ftp server. If the data is not on the client machine, the software kindly ftp's the files to the local directory. However, I don't allow write access to that directory as many people use the software and when it is d/l'ed it has the wrong perms for others to use it, thus I mirror the data I need from the ftp site locally. In the script below, there are 2 wget commands, but they are to slightly different directories (MODISA MODIST). I wouldn't recommend that. Using the same output directory for two different source directories seems likely to lead to problems. You'd most likely be better off by pulling to two locations, and then combining them afterwards. I don't know for sure that it _will_ cause problems (except if they happen to have same-named files), as long as .listing files are being properly removed (there were some recently-fixed bugs related to that, I think? ...just appending new listings on top of existing files). It appears to me that the problem occurs if there is a ftp server error, and wget starts a retry. wget goes to the server root, gets the .listing from there for some reason (as opposed to the directory it should go to on the server), and then goes to the dir it needs to mirror and can't find the files (that are listed in the root dir) and creates dirs, and then I get No such file errors and recursive directories created. Any advice would be appreciated. This snippet seems to be the source of the problem: Error in server response, closing control connection. Retrying. - --14:53:53-- ftp://oceans.gsfc.nasa.gov/MODIST/ATTEPH/2002/110/ (try: 2) = `/home1/software/modis/atteph/2002/110/.listing' Connecting to oceans.gsfc.nasa.gov|169.154.128.45|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD not required. == PASV ... done.== LIST ... done. That CWD not required bit is erroneous. I'm 90% sure we fixed this issue recently (though I'm not 100% sure that it went to release: I believe so). I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? I'll also try to look into this as I have time (but it might be awhile before I can give it some serious attention; it'd be very helpful if you could do a little more legwork). - -- Thanks very much, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgNh7M8hyUobTrERAuGoAKCCUoBN0sURKA/51x0o4HN59K8+AACfUYuj i8XW58MvjvbS3oy4OsOmbpc= =4kpD -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? This problem looks like the one that Mike Grant fixed in October of 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it should definitely be fixed in 1.11.4. Please let me know if it isn't. - -- Regards, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgY+7M8hyUobTrERArrRAJ4p4Y7jwWfic0Wul7UBnBXlSzD2XQCePifc kWs00JOULkzJmzozK7lmcfA= =iSL3 -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
Micah, Thanks for your quick attention to this. Yous, I probably forgot to include the version # [EMAIL PROTECTED] atteph]# wget --version GNU Wget 1.10.2 (Red Hat modified) Copyright (C) 2005 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. I will see if I can get the newest version for: [EMAIL PROTECTED] atteph]# cat /etc/redhat-release CentOS release 4.2 (Final) I'll let you know how that goes. Brock On Monday 27 October 2008 2:19 pm, Micah Cowan wrote: Micah Cowan wrote: I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? This problem looks like the one that Mike Grant fixed in October of 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it should definitely be fixed in 1.11.4. Please let me know if it isn't.
[bug] wrong speed calculation in (--output-file) logfile
Hello. During download with wget I've redirected output into file with the following command: $ LC_ALL=C wget -o output 'ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz' I've set LC_ALL and LANG explicitly to be sure that this is not locale related problem. The output I saw in output file was: --2008-10-25 14:51:17-- ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz = `OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13' Resolving mirror.yandex.ru... 77.88.19.68 Connecting to mirror.yandex.ru|77.88.19.68|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /gentoo-distfiles/distfiles ... done. == SIZE OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... 13633213 == PASV ... done.== RETR OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... done. Length: 13633213 (13M) 0K .. .. .. .. .. 0% 131K 1m41s 50K .. .. .. .. .. 0% 132K 1m40s 100K .. .. .. .. .. 1% 135K 99s 150K .. .. .. .. .. 1% 132K 99s 200K .. .. .. .. .. 1% 130K 99s 250K .. .. .. .. .. 2% 45.9K 2m9s 300K .. .. .. .. .. 2% 64.3M 1m50s [snip] 13250K .. .. .. .. .. 99% 131K 0s 13300K .. ...100% 134K=1m41s 2008-10-25 14:52:58 (132 KB/s) - `OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13' saved [13633213] Note the line above snip: 300K .. 2% 64.3M 1m50s This is impossible to download so much Mbytes as file is much less. I don't know why sometimes this number jumps, but in some cases it cause the following output at the end of download: 13300K .. ... 100% 26101G=1m45s Obviously I don't have possibility to download with such high (26101G=1m45s) speed. This is reproducible with wget 1.11.4. -- Peter.
--mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I try to keep a mirror of NASA atteph ancilliary data for modis processing. I know that means little, but I have a cron script that runs 2 times a day. Sometimes it works, and others, not so much. The sh script is listed at the end of this email below. As is the contents of the remote ftp server's root and portions fo the log. I don't need all the data on the remote server, only some thus I use - --cut-dirs. To make matters stranger, the software (also from NASA) that uses these files, looks for them in a single place on the client machine where the software runs, but needs data from 2 different directories on the remote ftp server. If the data is not on the client machine, the software kindly ftp's the files to the local directory. However, I don't allow write access to that directory as many people use the software and when it is d/l'ed it has the wrong perms for others to use it, thus I mirror the data I need from the ftp site locally. In the script below, there are 2 wget commands, but they are to slightly different directories (MODISA MODIST). It appears to me that the problem occurs if there is a ftp server error, and wget starts a retry. wget goes to the server root, gets the .listing from there for some reason (as opposed to the directory it should go to on the server), and then goes to the dir it needs to mirror and can't find the files (that are listed in the root dir) and creates dirs, and then I get No such file errors and recursive directories created. Any advice would be appreciated. Brock Murch Here is an example of the bad type of dir structure I end up with (there should be no EO1 and below): [EMAIL PROTECTED] atteph]# find . -type d -name * | grep EO1 ./2002/110/EO1 ./2002/110/EO1/CZCS ./2002/110/EO1/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS ./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS Or: [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/ COMMON [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/ CZCS README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/ And [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README ls: /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README: No such file or directory [EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README - -rw-r--r-- 1 root root 9499 Aug 20 10:12 /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README All the README files are all the same, and the same as the one is the ftp server
Hello, All and bug #21793
Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. -David A Coon
Re: Hello, All and bug #21793
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David Coon wrote: Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. Hi David, and welcome! If you haven't already, please see http://wget.addictivecode.org/HelpingWithWget I'd encourage you to get a Savannah account, so I can assign that bug to you. Also, I tend to hang out quite a bit on IRC (#wget @ irc.freenode.net), so you might want to sign on there. Since you mentioned an interest in Unicode and UTF-8, you might want to check out Saint Xavier's recent work on IRI and iDNS support in Wget, which is available at http://hg.addictivecode.org/wget/sxav/. Among other things, sxav's additions make Wget more aware of the user's locale, so it might be useful for providing a feature to automatically transcode filenames to the user's locale, rather than just supporting UTF-8 only (which should still probably remain an explicit option). If that sounds like the direction you'd like to take it, you should probably base your work on sxav's repository, rather than mainline. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD veVZAIH2NjbYI8dG6DimjRg= =9Qau -END PGP SIGNATURE-
Re: [BUG:#20329] If-Modified-Since support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 vinothkumar raman wrote: We need to give out the time stamp the local file in the Request header for that we need to pass on the local file's time stamp from http_loop() to get_http() . The only way to pass on this without altering the signature of the function is to add a field to struct url in url.h Could we go for it? That is acceptable. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1 AFkIYSyyyS4egbyXjzBLXBo= =fIT5 -END PGP SIGNATURE-
Re: [bug #20329] Make HTTP timestamping use If-Modified-Since
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yes, that's what it means. I'm not yet committed to doing this. I'd like to see first how many mainstream servers will respect If-Modified-Since when given as part of an HTTP/1.0 request (in comparison to how they respond when it's part of an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not in HTTP/1.1, that'd be an excellent case for holding off until we're doing HTTP/1.1 requests. Also, I don't think removing the previous HEAD request code is entirely accurate: we probably would want to detect when a server is feeding us non-new content in response to If-Modified-Since, and adjust to use the current HEAD method instead as a fallback. - -Micah vinothkumar raman wrote: This mean we should remove the previous HEAD request code and use If-Modified-Since by default and have it to handle all the request and store pages if it is not returning a 304 response Is it so? On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote: Follow-up Comment #4, bug #20329 (project wget): verbatim-mode's not all that readable. The gist is, we should go ahead and use If-Modified-Since, perhaps even now before there's true HTTP/1.1 support (provided it works in a reasonable percentage of cases); and just ensure that any Last-Modified header is sane. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2 8JiRBKtEhmcK3schVVO347A= =yCJV -END PGP SIGNATURE-
BUG : 202329 IF-MODIFIED-SINCE
Hi all, We need to give out the time stamp the local file in the Request header for that we need to pass on the local file's time stamp from http_loop() to get_http() . The only way to pass on this without altering the signature of the function is to add a field to struct url in url.h Could we go for it? Thanks, VinothKumar.R
[BUG:#20329] If-Modified-Since support
Hi all, We need to give out the time stamp the local file in the Request header for that we need to pass on the local file's time stamp from http_loop() to get_http() . The only way to pass on this without altering the signature of the function is to add a field to struct url in url.h Could we go for it? Thanks, VinothKumar.R
Re: [bug #20329] Make HTTP timestamping use If-Modified-Since
This mean we should remove the previous HEAD request code and use If-Modified-Since by default and have it to handle all the request and store pages if it is not returning a 304 response Is it so? On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote: Follow-up Comment #4, bug #20329 (project wget): verbatim-mode's not all that readable. The gist is, we should go ahead and use If-Modified-Since, perhaps even now before there's true HTTP/1.1 support (provided it works in a reasonable percentage of cases); and just ensure that any Last-Modified header is sane. ___ Reply to this item at: http://savannah.gnu.org/bugs/?20329 ___ Message sent via/by Savannah http://savannah.gnu.org/
RE: wget-1.11.4 bug
Micah Cowan wrote: The thing is, though, those two threads should be running wgets under separate processes Yes, the two threads are running wgets under seperate processes with system. What operating system are you running? Vista?mipsel-linux with kernel v2.4 built from gcc v3.3.5 Best regards, K.C. Chao _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: wget-1.11.4 bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 kuang-cheng chao wrote: Dear Micah: Thanks for your work of wget. There is a question about two wgets run simultaneously. In method resolve_bind_address, wget assumes that this is called once. However, this will cause two domain name with the same ip if two wgets run the same method concurrently. Have you reproduced this, or is this in theory? If the latter, what has led you to this conclusion? I don't see anything in the code that would cause this behavior. Also, please use the mailing list for discussions about Wget. I've added it to the recipients list. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIiYKF7M8hyUobTrERAr7fAJ0TnkLdEVOMy6wJA3Z1kIYC7dQoMACfZ9hb x5K6MTzhgVRCdKJwUGnbSRw= =EcFC -END PGP SIGNATURE-
RE: wget-1.11.4 bug
Micah Cowan wrote: Have you reproduced this, or is this in theory? If the latter, what has led you to this conclusion? I don't see anything in the code that would cause this behavior. I reproduce this. But I can't make sure the really problem is in resolve_bind_address. In the attached message, both api.yougotphogo.com and farm1.static.flickr.com get the same ip(74.124.203.218). The two wget are called from two threads of a program. Best regards, k.c. chao p.s. The log is follworing: wget -4 -t 6 http://api.yougotphoto.com/device/?action=get_device_new_photoapi=2.2api_key=f10df554a958fd10050e2d305241c7a3device_class=2serial_no=000E2EE5676Furl_no=24616cksn=44fe191d6cb4e7807f75938b5d72f07c; -O /tmp/webii/ygp_new_photo_list.txt--1999-11-30 00:04:21-- http://api.yougotphoto.com/device/?action=get_device_new_photoapi=2.2api_key=f10df554a958fd10050e2d305241c7a3device_class=2serial_no=000E2EE5676Furl_no=24616cksn=44fe191d6cb4e7807f75938b5d72f07cResolving api.yougotphoto.com... wget -4 -t 6 http://farm1.static.flickr.com/33/49038824_e4b04b7d9f_b.jpg; -O /tmp/webii/24616 74.124.203.218Connecting to api.yougotphoto.com|74.124.203.218|:80... --1999-11-30 00:04:22-- http://farm1.static.flickr.com/33/49038824_e4b04b7d9f_b.jpgResolving farm1.static.flickr.com... 74.124.203.218Connecting to farm1.static.flickr.com|74.124.203.218|:80... connected. _ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE
Re: wget-1.11.4 bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 k.c. chao wrote: Micah Cowan wrote: Have you reproduced this, or is this in theory? If the latter, what has led you to this conclusion? I don't see anything in the code that would cause this behavior. I reproduce this. But I can't make sure the really problem is in resolve_bind_address. In the attached message, both api.yougotphogo.com and farm1.static.flickr.com get the same ip(74.124.203.218). The two wget are called from two threads of a program. Yeah, I get 68.142.213.135 for the flickr.com address, currently. The thing is, though, those two threads should be running wgets under separate processes (I'm not sure how they couldn't be, but if they somehow weren't that would be using Wget other than how it was designed to be used). This problem sounds much more like an issue with the OS's API than an issue with Wget, to me. But we'd still want to work around it if it were feasible. What operating system are you running? Vista? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIirT17M8hyUobTrERAjsuAJ0crMPYIQficu1csou8Tt0jDFKvpQCeNYk3 1FhXl3uUYj2IA53qI1oOJ8A= =DvdG -END PGP SIGNATURE-
Re: WGET bug...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 HARPREET SAWHNEY wrote: Hi, I am getting a strange bug when I use wget to download a binary file from a URL versus when I manually download. The attached ZIP file contains two files: 05.upc --- manually downloaded dum.upc--- downloaded through wget wget adds a number of ascii characters to the head of the file and seems to delete a similar number from the tail. So the file sizes are the same but the addition and deletion renders the file useless. Could you please direct me on if I should be using some specific option to avoind this problem? In the future, it's useful to mention which version of Wget you're using. The problem you're having is that the server is adding the extra HTML at the front of your session, and then giving you the file contents anyway. It's a bug in the PHP code that serves the file. You're getting this extra content because you are not logged in when you're fetching it. You need to have Wget send a cookie with an login-session information, and then the server will probably stop sending the corrupting information at the head of the file. The site does not appear to use HTTP's authentication mechanisms, so the [EMAIL PROTECTED] bit in the URL doesn't do you any good. It uses Forms-and-cookies authentication. Hopefully, you're using a browser that stores its cookies in a text format, or that is capable of exporting to a text format. In that case, you can just ensure that you're logged in in your browser, and use the - --load-cookies=cookies.txt option to Wget to use the same session information. Otherwise, you'll need to use --save-cookies with Wget to simulate the login form post, which is tricky and requires some understanding of HTML Forms. - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFId9Vy7M8hyUobTrERAjCWAJ9niSjC5YdBDNcAbnBFWZX6D8AO7gCeM8nE i8jn5i5Y6wLX1g3Q2hlDgcM= =uOke -END PGP SIGNATURE-
Re: WGET bug...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 HARPREET SAWHNEY wrote: Hi, Thanks for the prompt response. I am using GNU Wget 1.10.2 I tried a few things on your suggestion but the problem remains. 1. I exported the cookies file in Internet Explorer and specified that in the Wget command line. But same error occurs. 2. I have an open session on the site with my username and password. 3. I also tried running wget while I am downloading a file from the IE session on the site, but the same error. Sounds like you'll need to get the appropriate cookie by using Wget to login to the website. This requires site-specific information from the user-login form page, though, so I can't help you without that. If you know how to read some HTML, then you can find the HTML form used for posting username/password stuff, and use wget --keep-session-cookies --save-cookies=cookies.txt \ - --post-data='username=foopassword=bar' ACTION Where ACTION is the value of the form's action field, USERNAME and PASSWORD (and possibly further required values) are field names from the HTML form, and FOO and BAR is the username/password. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFId+w97M8hyUobTrERAmLsAJ91231iGeO/albrgRuuUCRp8zFcnwCgiX3H fDp2J2oTBKlxW17eQ2jaCAA= =Khmi -END PGP SIGNATURE-
bug in wget
Hello, enterring following command results in an error: --- command start --- c:\Downloads\wget_v1.11.3bwget ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/; -P c:\Downloads\ --- command end --- wget cant convert .listing-file into a html-file regards _ Keine Mail mehr verpassen! Jetzt gibt’s Hotmail fürs Handy! http://www.gowindowslive.com/minisites/mail/mobilemail.aspx?Locale=de-de
Re: bug in wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sir Vision wrote: Hello, enterring following command results in an error: --- command start --- c:\Downloads\wget_v1.11.3bwget ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/; -P c:\Downloads\ --- command end --- wget cant convert .listing-file into a html-file As this seems to work fine on Unix, for me, I'll have to leave it to the Windows porting guy (hi Chris!) to find out what might be going wrong. ...however, it would really help if you would supply the full output you got, from wget, that leads you to believe Wget couldn't do this conversion. in fact, it wouldn't hurt to supply the -d flag as well, for maximum debugging messages. - -- Cheers, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer, and GNU Wget Project Maintainer. http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B dz38DW8jMMZtUxc+FhvIhfI= =T+mK -END PGP SIGNATURE-
.listing bug when using -c
wget-1.11.1 (and 1.10/1.10.1) don't handle the .listing file properly when -c is used. It just appends to that file instead of replacing it which means that wget tries to download each file twice when you run the same command twice. Have a look at this log: wget -m -nd -c ftp://ftp.redhat.com/pub/redhat/linux/rawhide/ --2008-04-03 15:30:17-- ftp://ftp.redhat.com/pub/redhat/linux/rawhide/ = `.listing' Resolving ftp.redhat.com... 209.132.176.30 Connecting to ftp.redhat.com|209.132.176.30|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /pub/redhat/linux/rawhide ... done. == PASV ... done.== LIST ... done. [ =] 259 --.-K/s in 0s 2008-04-03 15:30:19 (1.66 MB/s) - `.listing' saved [259] Already have correct symlink .message - README --2008-04-03 15:30:19-- ftp://ftp.redhat.com/pub/redhat/linux/rawhide/README = `README' == CWD not required. == PASV ... done.== RETR README ... done. Length: 404 100%[===] 404 --.-K/s in 0.007s 2008-04-03 15:30:21 (59.4 KB/s) - `README' saved [404] FINISHED --2008-04-03 15:30:21-- Downloaded: 2 files, 663 in 0.007s (95.3 KB/s) cat .listing drwxr-xr-x2 ftp ftp 4096 Nov 10 2003 . drwxr-xr-x8 ftp ftp 4096 May 15 2006 .. lrwxrwxrwx1 ftp ftp 6 Nov 10 2003 .message - README -rw-r--r--1 ftp ftp 404 Nov 10 2003 README wget -m -nd -c ftp://ftp.redhat.com/pub/redhat/linux/rawhide/ --2008-04-03 15:30:26-- ftp://ftp.redhat.com/pub/redhat/linux/rawhide/ = `.listing' Resolving ftp.redhat.com... 209.132.176.30 Connecting to ftp.redhat.com|209.132.176.30|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD /pub/redhat/linux/rawhide ... done. == PASV ... done.== LIST ... done. 100%[++=] 518 --.-K/s in 0s 2008-04-03 15:30:28 (2.36 MB/s) - `.listing' saved [518] Already have correct symlink .message - README Remote file no newer than local file `README' -- not retrieving. Already have correct symlink .message - README Remote file no newer than local file `README' -- not retrieving. FINISHED --2008-04-03 15:30:28-- Downloaded: 1 files, 518 in 0s (4.73 MB/s) cat .listing drwxr-xr-x2 ftp ftp 4096 Nov 10 2003 . drwxr-xr-x8 ftp ftp 4096 May 15 2006 .. lrwxrwxrwx1 ftp ftp 6 Nov 10 2003 .message - README -rw-r--r--1 ftp ftp 404 Nov 10 2003 README drwxr-xr-x2 ftp ftp 4096 Nov 10 2003 . drwxr-xr-x8 ftp ftp 4096 May 15 2006 .. lrwxrwxrwx1 ftp ftp 6 Nov 10 2003 .message - README -rw-r--r--1 ftp ftp 404 Nov 10 2003 README This happens only when -c is used. Karsten
Re: Bug
ok, thanks for your reply We have a work-around in place now, but it doesnt scale very good. Anyways, I'll start looking for another solution Thanks! Mark On Sat, Mar 1, 2008 at 10:15 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mark Pors wrote: Hi, I posted this bug over two years ago: http://marc.info/?l=wgetm=113252747105716w=4 From the release notes I see that this is still not resolved. Are there any plans to fix this any time soon? I'm not sure that's a bug. It's more of an architectural choice. Wget currently works by downloading a file, then, if it needs to look for links in that file, it will open it and scan through it. Obviously, it can't do that when you use -O -. There are plans to move Wget to a more stream-like process, where it scans links during download. At such time, it's very possible that -p will work the way you want it to. In the meantime, though, it doesn't. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9 u646lF2Qp0abOw3iuvD0ohg= =Cix9 -END PGP SIGNATURE-
Bug
Hi, I posted this bug over two years ago: http://marc.info/?l=wgetm=113252747105716w=4 From the release notes I see that this is still not resolved. Are there any plans to fix this any time soon? Thanks Mark
Re: bug on wget
Micah Cowan [EMAIL PROTECTED] writes: The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug). I'm not clear on exactly what's possibly a bug: do you mean the fact that Wget only calls attention to it in -d mode? That's what I meant. I probably agree with that behavior... most people probably aren't interested in being informed that a server breaks RFC 2616 mildly; Generally, if Wget considers a header to be in error (and hence ignores it), the user probably needs to know about that. After all, it could be the symptom of a Wget bug, or of an unimplemented extension the server generates. In both cases I as a user would want to know. Of course, Wget should continue to be lenient towards syntax violations widely recognized by popular browsers. Note that I'm not arguing that Wget should warn in this particular case. It is perfectly fine to not consider an empty `Set-Cookie' to be a syntax error and to simply ignore it (and maybe only print a warning in debug mode).
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: Generally, if Wget considers a header to be in error (and hence ignores it), the user probably needs to know about that. After all, it could be the symptom of a Wget bug, or of an unimplemented extension the server generates. In both cases I as a user would want to know. Of course, Wget should continue to be lenient towards syntax violations widely recognized by popular browsers. Note that I'm not arguing that Wget should warn in this particular case. It is perfectly fine to not consider an empty `Set-Cookie' to be a syntax error and to simply ignore it (and maybe only print a warning in debug mode). That was my thought. I agree with both of your points above: if Wget's not handling something properly, I want to know about it; but at the same time, silently ignoring (erroneous) empty headers doesn't seem like a problem. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHRGqx7M8hyUobTrERCPwQAJ4wGFwPBqyoVDXjrOifNB/fVF1vtACbBnDU fnSx/Vj+S+DVnfRUbIz5HKU= =n4yr -END PGP SIGNATURE-
bug on wget
Hi, I got a bug on wget when executing: wget -a log -x -O search/search-1.html --verbose --wait 3 --limit-rate=20K --tries=3 http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1 Segmentation fault (core dumped) I created directory search. The above creates a file search/search-1.html zero-sized. Logfile log: Resolviendo www.nepremicnine.net... 212.103.144.204 Conectando a www.nepremicnine.net|212.103.144.204|:80... conectado. Petición HTTP enviada, esperando respuesta... 200 OK --18:18:28-- http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1 = `search/search-1.html' (I hope you understand the Spanish above. If not, labels are the usual: resolving, connecting, HTTP petition sent, waiting for request) It happens the same when varying the parameter on the url id_regije, just in case it helps. I'm using Intel CoreDuo E6300, plenty of disk/mem space. ubuntu 7.10 Should you need any further information don't hesitate to contact. Regards Diego
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Diego Campo wrote: Hi, I got a bug on wget when executing: wget -a log -x -O search/search-1.html --verbose --wait 3 --limit-rate=20K --tries=3 http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1 Segmentation fault (core dumped) Hi Diego, I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* * Unfortunately, it has been expected to release soon for a few months now; we got hung up with some legal/licensing issues that are yet to be resolved. It will almost certainly be released in the next few weeks, though. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHQypR7M8hyUobTrERCF99AJ4w790h4juXzPwO+csBbSY3KcLOXACdGYgO Kf4Oawgfjx6WOEzYwkQ47mw= =8gL2 -END PGP SIGNATURE-
Re: bug on wget
Micah Cowan [EMAIL PROTECTED] writes: I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* I think the old Wget crashed on empty Set-Cookie headers. That got fixed when I converted the Set-Cookie parser to use extract_param. The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug).
Re: bug on wget
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: I was able to reproduce the problem above in the release version of Wget; however, it appears to be working fine in the current development version of Wget, which is expected to release soon as version 1.11.* I think the old Wget crashed on empty Set-Cookie headers. That got fixed when I converted the Set-Cookie parser to use extract_param. The new Wget flags empty Set-Cookie as a syntax error (but only displays it in -d mode; possibly a bug). I'm not clear on exactly what's possibly a bug: do you mean the fact that Wget only calls attention to it in -d mode? I probably agree with that behavior... most people probably aren't interested in being informed that a server breaks RFC 2616 mildly; especially if it's not apt to affect the results. Unless of course the user was expecting that the user send a real cookie, but I'm guessing that this only happens when the server doesn't have one to send (or something). But a user in that situation should be using -d (or at least - -S) to find out what the server is sending. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHQ3N97M8hyUobTrERCCpFAJ9RHcdJ8X4UWpEQIhz+khDWc8MOJwCfZANU vr2lCTLP04R/PP/cBf7sIpE= =6csr -END PGP SIGNATURE-
bug in escaped filename calculation?
Hello, I'm wondering if I've found a bug in the excellent wget. I'm not asking for help, because it turned out not to be the reason one of my scripts was failing. The possible bug is in the derivation of the filename from a URL which contains UTF-8. The case is: wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk Of course these are all ascii characters, but underlying it are 3 nonascii characters, whose UTF-8 encoding is: hexoctal name --- - C387 303 274 C-cedilla C3B6 303 266 o-umlaut C3BC 303 274 u-umlaut The file created has a name that's almost, but not quite, a valid UTF-8 bytestring ... ls *y*k | od -tc 000 303 % 8 7 a t a l h 303 266 y 303 274 k \n Ie the o-umlaut u-umlaut UTF-8 encodings occur in the bytestring, but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the 3-byte string %87. I'm guessing this is not intended. I would have sent a fix too, but after finding my way through http.c retr.c I got lost in url.c. Brian Keck
Re: bug in escaped filename calculation?
On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote: I would have sent a fix too, but after finding my way through http.c retr.c I got lost in url.c. You and me both. A lot of the code needs re-written.. there's a lot of spaghetti code in there. I hope Micah chooses to do a complete re-write for version 2 so I can get my hands dirty and understand the code better.
Re: bug in escaped filename calculation?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Josh Williams wrote: On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote: I would have sent a fix too, but after finding my way through http.c retr.c I got lost in url.c. You and me both. A lot of the code needs re-written.. there's a lot of spaghetti code in there. I hope Micah chooses to do a complete re-write for version 2 so I can get my hands dirty and understand the code better. Currently, I'm planning on refactoring what exists, as needed, rather than going for a complete rewrite. This will be driven by unit-tests, to try to ensure that we do not lose functionality along the way. This involves more work overall, but IMO has these key advantages: * as mentioned, it's easier to prevent functionality loss, * we will be able to use the work as its written, instead of waiting many months for everything to be finished (especially with the current number of developers), and * AIUI, the wording of employer copyright assignment releases may not apply to new works that are not _preexisting_ as GPL works. This means that, if a rewrite ended up using no code whatsoever from the original work (not likely, but...), there could be legal issues. After 1.11 is released (or possibly before), one of my top priorities is to clean up the gethttp and http_loop functions to a degree where they can be much more readily read and understood (and modified!). This is important to me because so far (in my probably-not-statistically-significant 3 months as maintainer) a majority of the trickier fixes have been in those two functions. Some of these fixes seem to frequently introduce bugs of their own, and I spend more time than seems right in trying to understand the code there, which is why these particular functions are prime targets for refactoring. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHBR7E7M8hyUobTrERCCrbAJ9Jw7LB/YW4myDOyPiHvXLZ13rkNQCeOVbf 5INV0ApmUTuzxp8zO5haVCA= =EeEd -END PGP SIGNATURE-
Re: bug in escaped filename calculation?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Brian Keck wrote: Hello, I'm wondering if I've found a bug in the excellent wget. I'm not asking for help, because it turned out not to be the reason one of my scripts was failing. The possible bug is in the derivation of the filename from a URL which contains UTF-8. The case is: wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk Of course these are all ascii characters, but underlying it are 3 nonascii characters, whose UTF-8 encoding is: hexoctal name --- - C387 303 274 C-cedilla C3B6 303 266 o-umlaut C3BC 303 274 u-umlaut The file created has a name that's almost, but not quite, a valid UTF-8 bytestring ... ls *y*k | od -tc 000 303 % 8 7 a t a l h 303 266 y 303 274 k \n Ie the o-umlaut u-umlaut UTF-8 encodings occur in the bytestring, but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the 3-byte string %87. Using --restrict=nocontrol will do what you want it to, in this instance. I'm guessing this is not intended. Actually, it is (more-or-less). Realize that Wget really has no idea how to tell whether you're trying to give it UTF-8, or one of the ISO latin charsets. It tends to assume the latter. It also, by default, will not create filenames with control characters in them. In ISO latin, characters in the range 0x80-0x9f are control characters, which is why Wget left %87 escaped, which falls into that range, but not the others, which don't. It is actually illegal to specify byte values outside the range of ASCII characters in a URL, but it has long been historical practice to do so anyway. In most cases, the intended meaning was one of the latin character sets (usually latin1), so Wget was right to do as it does, at that time. There is now a standard for representing Unicode values in URLs, whose result is then called IRLs (Internationalized Resource Locators). Conforming correctly to this standard would require that Wget be sensitive to the context and encoding of documents in which it finds URLs; in the case of filenames and command arguments, it would probably also require sensitivity to the current locale as determined by environment variables. Wget is simply not equipped to handle IRLs or encoding issues at the moment, so until it is, a proper fix will not be in place. Addressing these are considered a Wget 2.0 (next-generation Wget functionality) priority, and probably won't be done for a year or two, given that the number of developers involved with Wget, if you add up all the part-time helpers (including me), is probably still less than one full-time dev. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHBSHX7M8hyUobTrERCKRLAJwKiDOo0uO7x/k/iAEB/W0pPQmUJQCfUHaP c6k2490strgy1Efy1DmiOhA= =7lvZ -END PGP SIGNATURE-
Re: bug in escaped filename calculation?
Micah Cowan [EMAIL PROTECTED] writes: It is actually illegal to specify byte values outside the range of ASCII characters in a URL, but it has long been historical practice to do so anyway. In most cases, the intended meaning was one of the latin character sets (usually latin1), so Wget was right to do as it does, at that time. Your explanation is spot-on. I would only add that Wget's interpretation of what is a control character is not so much geared toward Latin 1 as it is geared toward maximum safety. Originally I planned to simply encode *all* file name characters outside the 32-127 range, but in practice it was very annoying (not to mention US-centric) to encode perfectly valid Latin 1/2/3/... as %xx. Since the codes 128-159 *are* control characters (in those charsets) that can mess up your screen and that you wouldn't want seen by default, I decided to encode them by default, but allow for a way to turn it off, in case someone used a different charset. In the long run, supporting something like IRL is surely the right thing to go for, but I have a feeling that we'll be stuck with the current messy URLs for quite some time to come. So Wget simply needs to adapt to the current circumstances. If the locale includes UTF-8 in any shape or form, it is perfectly safe to assume that it's valid to create UTF-8 file names. Of course, we don't know if a particular URL path sequence is really meant to be UTF-8, but there should be no harm in allowing valid UTF-8 sequences to pass through. In other words, the default quote control policy could simply be smarter about what control means. One consequence would be that Wget creates differently-named files in different locales, but it's probably a reasonable price to pay for not breaking an important expectation. Another consequence would be making users open to IDN homograph attacks, but I don't know if that's a problem in the context of creating file names (IDN is normally defined as a misrepresentation of who you communicate with). For those who want to hack on this, the place to look at is url.c:append_uri_pathel; that strangely-named function takes a path element (a directory name or file name component of the URL) and appends it to the file name. It takes care not to ever use .. as a path component and to respect the --restrict-file-names setting as specified by the user. It could be made to recognize UTF-8 character sequences in UTF-8 locales and exempt valid UTF-8 chars from being treated as control characters. Invalid UTF-8 chars would still pass all the checks, and non-canonical UTF-8 sequences would be rejected (by condemning their byte values to being escaped as %..). This is not much work for someone who understands the basics of UTF-8.
[fwd] Wget Bug: recursive get from ftp with a port in the url fails
---BeginMessage--- Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like Cantara. The file system is NTFS. Well I find my problem is, I wrote the command in schedule tasks like this: wget -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P d:\virus.update\kaspersky well, after wget,and before -N, I typed TWO spaces. After delete one space, wget works well again. Hope this can help. :) -- from:baalchina ---End Message---
Re: [fwd] Wget Bug: recursive get from ftp with a port in the url fails
Hrvoje Niksic wrote: Subject: Re: Wget Bug: recursive get from ftp with a port in the url fails From: baalchina [EMAIL PROTECTED] Date: Mon, 17 Sep 2007 19:56:20 +0800 To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: multipart/alternative; boundary===-=-= Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like Cantara. The file system is NTFS. Well I find my problem is, I wrote the command in schedule tasks like this: wget -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P d:\virus.update\kaspersky well, after wget,and before -N, I typed TWO spaces. After delete one space, wget works well again. Hope this can help. :) Hi baalchina, Hrvoje forwarded your message to the Wget discussion mailing list, where such questions are really more appropriate, especially since Hrvoje is not maintaining Wget any longer, but has left that responsibility for others. What you're describing does not appear to be a bug in Wget; it's the shell's (or task scheduler's, or whatever) responsibility to split space-separated elements properly; the words are supposed to already be split apart (properly) by the time Wget sees it. Also, you didn't really describe what was going wrong with Wget, or what message about it's failure you were seeing (perhaps you'd need to specify a log file with -o log, or via redirection of the command interpreter supports it). However, if the problem is that Wget was somehow seeing the space, as a separate argument or as part of another one, then the bug lies with your task scheduler (or whatever is interpreting the command line). -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ signature.asc Description: OpenPGP digital signature
ftp-ls.c - filesize parsing bug
Hello, What the heck was this code supposed to do in ftp-ls.c? If there is only a single space between the previous token and the filesize, then t points at the NULL character, and filesize is thought to be 0, resulting in a mismatch everytime. ptok is already pointing at the start of the token, I don't understand the need to try to decrement the pointer. I commented out the two lines to fix the issue. Thanks! (ps Where is the ftp chdir bugfix?! No wget releases...) Jason /* Back up to the beginning of the previous token and parse it with str_to_wgint. */ char *t = ptok; while (t line ISDIGIT (*t)) // useless and buggy --t; // useless and buggy if (t == line) _ Learn.Laugh.Share. Reallivemoms is right place! http://www.reallivemoms.com?ocid=TXT_TAGHMloc=us
Re: bug and patch: blank spaces in filenames causes looping
On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote: sprintf(filecopy, \%.2047s\, file); This fix breaks the FTP protocol, making wget instantly stop working with many conforming servers, but apparently start working with yours; the RFCs are very clear that the file name argument starts right after the string RETR ; the very next character is part of the file name, including if the next character is a space (or a quote). The file name is terminated by the CR LF sequence (which implies that the sequence CR LF may not occcur in the filename). Therefore, if you ask for a file file.txt, a conforming server will attempt to find and deliver a file whose name begins and ends with double-quotes. Therefore, this seems like a server problem. I think you may well be correct. I am now unable to reproduce the problem where the server does not recognize a filename unless I give it quotes. In fact, as you say, the server ONLY recognizes filenames WITHOUT quotes and quoting breaks it. I had to revert to the non- quoted code to get proper behavior. I am very confused now. I apologize profusely for wasting your time. How embarrassing! I'll save this email, and if I see the behavior again, I will provide you with the details you requested below. Could you please provide the following: 1. The version of wget you are running (wget --version) 2. The exact command line you are using to invoke wget 3. The output of that same command line, run with --debug -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
Re: bug and patch: blank spaces in filenames causes looping
On 7/15/07, Rich Cook [EMAIL PROTECTED] wrote: I think you may well be correct. I am now unable to reproduce the problem where the server does not recognize a filename unless I give it quotes. In fact, as you say, the server ONLY recognizes filenames WITHOUT quotes and quoting breaks it. I had to revert to the non- quoted code to get proper behavior. I am very confused now. I apologize profusely for wasting your time. How embarrassing! I'll save this email, and if I see the behavior again, I will provide you with the details you requested below. I wouldn't say it was a waste of time. Actually, I think it's good for us to know that this problem exists on some servers. We're considering writing a patch to recognise servers that do not support spaces. If the standard method fails, then it will retry as an escaped character. Nothing has been written for this yet, but it has been discussed, and may be implemented in the future.
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote: sprintf(filecopy, \%.2047s\, file); This fix breaks the FTP protocol, making wget instantly stop working with many conforming servers, but apparently start working with yours; the RFCs are very clear that the file name argument starts right after the string RETR ; the very next character is part of the file name, including if the next character is a space (or a quote). The file name is terminated by the CR LF sequence (which implies that the sequence CR LF may not occcur in the filename). Therefore, if you ask for a file file.txt, a conforming server will attempt to find and deliver a file whose name begins and ends with double-quotes. Therefore, this seems like a server problem. I think you may well be correct. I am now unable to reproduce the problem where the server does not recognize a filename unless I give it quotes. In fact, as you say, the server ONLY recognizes filenames WITHOUT quotes and quoting breaks it. I had to revert to the non-quoted code to get proper behavior. I am very confused now. I apologize profusely for wasting your time. How embarrassing! No worries, it happens! Sometimes the tests we run go other than we think they did. :) I'll save this email, and if I see the behavior again, I will provide you with the details you requested below. That would be terrific, thanks. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGmpOD7M8hyUobTrERCA7FAJ4oygvX7rpQy1k5FL7j3R12LUdWUACfVHrc sk1tpS12pDYBvVbD4Nv7/I4= =KCxk -END PGP SIGNATURE-
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: On OS X, if a filename on the FTP server contains spaces, and the remote copy of the file is newer than the local, then wget gets thrown into a loop of No such file or directory endlessly. I have changed the following in ftp-simple.c, and this fixes the error. Sorry, I don't know how to use the proper patch formatting, but it should be clear. I and another developer could not reproduce this problem, either in the current trunk or in wget 1.10.2. sprintf(filecopy, \%.2047s\, file); This fix breaks the FTP protocol, making wget instantly stop working with many conforming servers, but apparently start working with yours; the RFCs are very clear that the file name argument starts right after the string RETR ; the very next character is part of the file name, including if the next character is a space (or a quote). The file name is terminated by the CR LF sequence (which implies that the sequence CR LF may not occcur in the filename). Therefore, if you ask for a file file.txt, a conforming server will attempt to find and deliver a file whose name begins and ends with double-quotes. Therefore, this seems like a server problem. Could you please provide the following: 1. The version of wget you are running (wget --version) 2. The exact command line you are using to invoke wget 3. The output of that same command line, run with --debug Thank you very much. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGl9KT7M8hyUobTrERCJfoAJ91z9c2GniuoaX0mj9oqzHrrpNCtQCePQnm lvbVe0i5/jVy9V10uQpYgmk= =iQq1 -END PGP SIGNATURE-
Re: [bug #20323] Wget issues HEAD before GET, even when the file doesn't exist locally.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Mauro Tortonesi wrote: Micah Cowan ha scritto: Update of bug #20323 (project wget): Status: Ready For Test = In Progress ___ Follow-up Comment #3: Moving back to In Progress until some questions about the logic are answered: http://addictivecode.org/pipermail/wget-notify/2007-July/75.html http://addictivecode.org/pipermail/wget-notify/2007-July/77.html thanks micah. i have partly misunderstood the logic behind preliminary HEAD request. in my code, HEAD is skipped if -O or --no-content-disposition are given, but if -N is given HEAD is always sent. this is wrong, as HEAD should be skipped even if -N and --no-content-disposition are given (no need to care about the deprecated -N -O combination). can't think of any other case in which HEAD should be skipped, though. Cc'ing wget ML, as it's probably important to open up discussion of the current logic. What about the case when nothing is given on the command line except - --no-content-disposition? What do we need HEAD for then? Also: I don't believe HEAD should be sent if no options are given on the command line. What purpose would that serve? If it's to find a possible Content-Disposition header, we can get that (and more reliably) at GET time (though, I believe we may currently be requiring the file name before we fetch, which if true, should definitely be changed but not for 1.11, in which case the HEAD will be allowed for the time being); and since we're not matching against potential accept/reject lists, we don't really need it. I think it really makes much more sense to enumerate those few cases where we need to issue a HEAD, rather than try to determine all the cases where we don't: if I have to choose a side to err on, I'd rather not send HEAD in a case or two where we needed it, rather than send it in a few where we didn't, as any request-response cycle eats up time. I also believe that the cases where we want a HEAD are/should be fewer than the cases where we don't want them. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlol+7M8hyUobTrERCOT0AJwNt2dm/80zL7UYbadBaiaPrMvSUQCePKmS WO77ltxl0vr0Pcgd8H1bIY8= =zCTU -END PGP SIGNATURE-
Re: [wget-notify] [bug #20466] --delete-after and --spider should not create (and leave) directories
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Joshua David Williams wrote: URL: http://savannah.gnu.org/bugs/?20466 ... Details: This patch forces the --no-directories option if we're not actually keeping the files we're downloading (as in the --delete-after and --spider options). This way, we don't leave a mess of empty directories. This seems like a reasonable idea, but I'd like to get some discussion on it first. The downside, of course, is that there's no short option to reverse the implied -nd; they'll have to use --directories (at the time I was discussing it with Josh, I'd been thinking -e would be needed, but this seems to be untrue). It seems to me that by far the most common intention would be not to leave any files around; this behavior seems fairly reason to me. Thoughts? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlpx17M8hyUobTrERCKP5AJ4rHtoA7xy9FNidKS7WooTwmF5xGACfYHv2 fIwxjHVH/t3H6/xkVk4Yqio= =ZbKt -END PGP SIGNATURE-
[Fwd: Bug#281201: wget prints it's progress even when background]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 The following bug was submitted to Debian's bug tracker. I'm curious what people think about this suggestion. Don't we already check for something like redirected output (and force the progress indicator to dots)? It seems to me that if that is appropriate, then a case could be made for this as well. Perhaps instead of shutting up, though, wget should attempt to direct to a file? Perhaps with a one last message to the terminal (assuming the terminal doesn't have TOSTOP set--it should ignore SIGTTOU and handle EIO to handle that case), to indicate that it's doing this. - -Micah - Original Message Subject: Bug#281201: wget prints it's progress even when background Resent-Date: Tue, 10 Jul 2007 13:57:01 +, Tue, 10 Jul 2007 13:57:02 + Resent-From: Ilya Anfimov [EMAIL PROTECTED] Resent-To: [EMAIL PROTECTED] Resent-CC: Noèl Köthe [EMAIL PROTECTED] Date: Tue, 10 Jul 2007 17:54:51 +0400 From: Ilya Anfimov [EMAIL PROTECTED] Reply-To: Ilya Anfimov [EMAIL PROTECTED], [EMAIL PROTECTED] To: Peter Eisentraut [EMAIL PROTECTED] CC: [EMAIL PROTECTED] My suggestion is to stop printing verbose progress messages when the job is resumed in background. It could be checked by (successful) getpgrp() not equal to (successful) tcgetprp(1) in SIGCONT signal handler. And something like this is used in some console applications, for example, in lftp. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGlThP7M8hyUobTrERCA4sAJ0RwfVIsL5UcafLkfm5qihERnRNvQCeIABc t+Y3FeNYctJsdPcPbTwYukk= =eBSi -END PGP SIGNATURE-
Re: wget bug?
On Mon, 9 Jul 2007 15:06:52 +1200 [EMAIL PROTECTED] wrote: wget under win2000/win XP I get No such file or directory error messages when using the follwing command line. wget -s --save-headers http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; %1 = 212BI Any ideas? hi nikolaus, in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. try using %1% instead of %1. -- Mauro Tortonesi [EMAIL PROTECTED]
Re: wget bug?
Mauro Tortonesi schrieb: On Mon, 9 Jul 2007 15:06:52 +1200 [EMAIL PROTECTED] wrote: wget under win2000/win XP I get No such file or directory error messages when using the follwing command line. wget -s --save-headers http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; %1 = 212BI Any ideas? hi nikolaus, in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. try using %1% instead of %1. AFAIK it's ok to use %1, because it is a special case. Also the error would be a 404 or some wget error in that case the variable gets substituted in a wrong way or not? (actually even than you get a 200 response with that url) I just tried using the command inside a batch-file and came across another problem: You used a lowercase -s wich is not recognized by my wget-version, but a uppercase -S is. i guess you should change that. I would guess wget is not in your PATH. Try using c:\path\to\the dircetory\wget.exe instead of just wget. If this too does not hel at explicit --restrict-file-names=windows to your options, so wget does not try to use the ? inside a filename. (normally not needed) So a should-work-for-all-means-version is c:\path\wget.exe -S --save-headers --restrict-file-names=windows http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; Of course just one line, but my dump mail-editor wrapped it. Greetings Matthias
Re: Bug update notifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Matthew Woehlke wrote: Micah Cowan wrote: The wget-notify mailing list (http://addictivecode.org/mailman/listinfo/wget-notify) will now also be receiving notifications of bug updates from GNU Savannah, in addition to subversion commits. ...any reason to not CC bug updates here also/instead? That's how e.g. kwrite does thing (also several other lists AFAIK), and seems to make sense. This is 'bug-wget' after all :-). It is; but it's also 'wget'. While I agree that it probably makes sense to send it to a bugs discussion list, this list is a combination bugs/development/support/general discussion list, and I'm not certain it's appropriate to bump up the traffic level for this. Still, if there are enough folks that would like to get these updates (without also seeing commit notifications), perhaps we could craft a second list for this (or, alternatively, split off the main discussion/support list from the bugs list)? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGkrpK7M8hyUobTrERCIMaAKCDG8JN7DmUK7oIuE0fYmgYnZIrlgCghK7n iV8rIDYe1+cxzrQATM43CEM= =PKqt -END PGP SIGNATURE-
Re: Bug update notifications
Micah Cowan wrote: Matthew Woehlke wrote: Micah Cowan wrote: ...any reason to not CC bug updates here also/instead? That's how e.g. kwrite does thing (also several other lists AFAIK), and seems to make sense. This is 'bug-wget' after all :-). It is; but it's also 'wget'. Hmm, so it is; my bad :-). While I agree that it probably makes sense to send it to a bugs discussion list, this list is a combination bugs/development/support/general discussion list, and I'm not certain it's appropriate to bump up the traffic level for this. Still, if there are enough folks that would like to get these updates (without also seeing commit notifications), perhaps we could craft a second list for this (or, alternatively, split off the main discussion/support list from the bugs list)? I guess a common pattern is: foo-help foo-devel foo-commits ...but of course you're the maintainer, it's your call :-). (The above aren't necessarily actual names of course, just the categories it seems like I'm most used to seeing. e.g. the GNU convention is of course bug-foo, not foo-devel.) -- Matthew This .sig is false
wget bug?
wget under win2000/win XP I get No such file or directory error messages when using the follwing command line. wget -s --save-headers http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc; %1 = 212BI Any ideas? thank you Dr Nikolaus Hermanspahn Advisor (Science) National Radiation Laboratory Ministry of Health DDI: +64 3 366 5059 Fax: +64 3 366 1156 http://www.nrl.moh.govt.nz mailto:[EMAIL PROTECTED] Statement of confidentiality: This e-mail message and any accompanying attachments may contain information that is IN-CONFIDENCE and subject to legal privilege. If you are not the intended recipient, do not read, use, disseminate, distribute or copy this message or attachments. If you have received this message in error, please notify the sender immediately and delete this message. * This e-mail message has been scanned for Viruses and Content and cleared by the Ministry of Health's Content and Virus Filtering Gateway *
Re: wget on gnu.org: Report a Bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Lewis wrote: The “Report a Bug” section of http://www.gnu.org/software/wget/ should encourage submitters to send as much relevant information as possible including wget version, operating system, and command line. The submitter should also either send or at least save a copy of the --debug output. This information is currently in the bug submitting form at Savannah: https://savannah.gnu.org/bugs/?func=additemgroup=wget But should probably be duplicated at the website as well... let me know if the current text could use improvement. Perhaps we need a --bug option for the command line that runs the command and saves important information in a file that can be submitted along with the bug report. The saved information would have to be sanitized to remove things like user IDs and passwords but could include things like the wget version, command line options, and what the command tried to do. I think perhaps such things as the wget version and operating system ought to be emitted by default anyway (except when -q is given). Other than that, what kinds of things would --bug provide above and beyond --debug? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGj+hk7M8hyUobTrERCHqtAJ9HTIFd3hOJ2R9aQBUqCtsvW2xJ1wCePOfo 67Olfti9HtI+1pYkNiCj7rc= =/Rhd -END PGP SIGNATURE-
RE: wget on gnu.org: Report a Bug
Micah Cowan wrote: This information is currently in the bug submitting form at Savannah: That looks good. I think perhaps such things as the wget version and operating system ought to be emitted by default anyway (except when -q is given). I'm not convinced that wget should ordinarily emit the operating system. It's really only useful to someone other than the person running the command. Other than that, what kinds of things would --bug provide above and beyond --debug? It should echo the command line and the contents of .wgetrc to the bug output, which even the --debug option does not do. Perhaps we will think of other things to include in the output if this option gets added. However, the big difference would be where the output was directed. When invoked as: wget ... --bug bug_report all interesting (but sanitized) information would be written to the file bug_report whether or not the command included --debug, which would also direct the debugging output to STDOUT. The main reason I had for suggesting this option is that it would be easy to tell newbies with problems to run the exact same command with --bug bug_report and send the file bug_report to the list (or to whomever is working on the problem). The user wouldn't see the command behave any differently, but we'd have the information we need to investigate the report. It might even be that most of us would choose to run with --bug most of the time relying on the normal wget output except when something appears to have gone wrong and then checking the file when it does. Tony
Re: wget on gnu.org: Report a Bug
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Micah Cowan wrote: Tony Lewis wrote: The “Report a Bug” section of http://www.gnu.org/software/wget/ should encourage submitters to send as much relevant information as possible including wget version, operating system, and command line. The submitter should also either send or at least save a copy of the --debug output. This information is currently in the bug submitting form at Savannah: https://savannah.gnu.org/bugs/?func=additemgroup=wget But should probably be duplicated at the website as well... let me know if the current text could use improvement. I've copied the text to the website, along with a link to Simon Tatham's essay on reporting bugs. I also added a small section regarding the IRC #wget channel on FreeNode. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGkDhh7M8hyUobTrERCDBQAJ4ln3eWsbdbsa5ahfB7kv5tHIc1wACeLSIj uXkezPuzt7GMoiXvUemMT9U= =2dVK -END PGP SIGNATURE-
Bug update notifications
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 The wget-notify mailing list (http://addictivecode.org/mailman/listinfo/wget-notify) will now also be receiving notifications of bug updates from GNU Savannah, in addition to subversion commits. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGkG0Q7M8hyUobTrERCLVXAJwP7ru9v88PFF6PgREWTn0XF7XRnwCfY1hd 4W1KLuYYRvZ0pSXOLk6YY/Y= =TOP4 -END PGP SIGNATURE-
Re: bug and patch: blank spaces in filenames causes looping
From various: [...] char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } [...] It should be: sprintf(filecopy, \%.2045s\, file); [...] I'll admit to being old and grumpy, but am I the only one who shudders when one small code segment contains 2048, 2047, and 2045 as separate, independent literal constants, instead of using a macro, or sizeof, or something which would let the next fellow change one buffer size in one place, instead of hunting all over the code looking for every 20xx which might be related? Just a thought. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Steven M. Schweda wrote: From various: [...] char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } [...] It should be: sprintf(filecopy, \%.2045s\, file); [...] I'll admit to being old and grumpy, but am I the only one who shudders when one small code segment contains 2048, 2047, and 2045 as separate, independent literal constants, instead of using a macro, or sizeof, or something which would let the next fellow change one buffer size in one place, instead of hunting all over the code looking for every 20xx which might be related? Well, as already mentioned, aprintf() would be much more appropriate, as it elminates the need for constants like these. And yes, magic numbers drive me crazy, too. Of course, when used with printf's 's' specifier, it needs special handling (crafting a STR() macro or somesuch). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjxcX7M8hyUobTrERCHSAAJ9VkQdfhK4/LwByseYH2ZYVzoPqPwCePU1k 2Llybpq/oceXWMyZpBO4bPY= =Vj/R -END PGP SIGNATURE-
RE: bug and patch: blank spaces in filenames causes looping
There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); It should be: sprintf(filecopy, \%.2045s\, file); in order to leave room for the two quotes. Tony -Original Message- From: Rich Cook [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 04, 2007 10:18 AM To: [EMAIL PROTECTED] Subject: bug and patch: blank spaces in filenames causes looping On OS X, if a filename on the FTP server contains spaces, and the remote copy of the file is newer than the local, then wget gets thrown into a loop of No such file or directory endlessly. I have changed the following in ftp-simple.c, and this fixes the error. Sorry, I don't know how to use the proper patch formatting, but it should be clear. == the beginning of ftp_retr: = /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; /* Send RETR request. */ request = ftp_request (RETR, file); == becomes: == /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } /* Send RETR request. */ request = ftp_request (RETR, filecopy); -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
Re: bug and patch: blank spaces in filenames causes looping
Good point, although it's only a POTENTIAL buffer overflow, and it's limited to 2 bytes, so at least it's not exploitable. :-) On Jul 5, 2007, at 9:05 AM, Tony Lewis wrote: There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); It should be: sprintf(filecopy, \%.2045s\, file); in order to leave room for the two quotes. Tony -Original Message- From: Rich Cook [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 04, 2007 10:18 AM To: [EMAIL PROTECTED] Subject: bug and patch: blank spaces in filenames causes looping On OS X, if a filename on the FTP server contains spaces, and the remote copy of the file is newer than the local, then wget gets thrown into a loop of No such file or directory endlessly. I have changed the following in ftp-simple.c, and this fixes the error. Sorry, I don't know how to use the proper patch formatting, but it should be clear. == the beginning of ftp_retr: = /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; /* Send RETR request. */ request = ftp_request (RETR, file); == becomes: == /* Sends RETR command to the FTP server. */ uerr_t ftp_retr (int csock, const char *file) { char *request, *respline; int nwritten; uerr_t err; char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } /* Send RETR request. */ request = ftp_request (RETR, filecopy); -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time. -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
RE: bug and patch: blank spaces in filenames causes looping
-Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Tony Lewis [EMAIL PROTECTED] writes: Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length. If it uses the heap, then doesn't that open a hole where a particularly long file name would overflow the heap? -- URL: http://wiki.tcl.tk/ Even if explicitly stated to the contrary, nothing in this posting should be construed as representing my employer's opinions. URL: mailto:[EMAIL PROTECTED] URL: http://www.purl.org/NET/lvirden/
Re: bug and patch: blank spaces in filenames causes looping
Tony Lewis [EMAIL PROTECTED] writes: There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length.
Re: bug and patch: blank spaces in filenames causes looping
Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string.
Re: bug and patch: blank spaces in filenames causes looping
Virden, Larry W. [EMAIL PROTECTED] writes: Tony Lewis [EMAIL PROTECTED] writes: Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length. If it uses the heap, then doesn't that open a hole where a particularly long file name would overflow the heap? No, aprintf tries to allocate as much memory as necessary. If the memory is unavailable, malloc returns NULL and Wget exits.
Re: bug and patch: blank spaces in filenames causes looping
Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? I'd use asprintf, but I'm afraid to suggest that here as it may not be portable. On Jul 5, 2007, at 10:45 AM, Hrvoje Niksic wrote: Tony Lewis [EMAIL PROTECTED] writes: There is a buffer overflow in the following line of the proposed code: sprintf(filecopy, \%.2047s\, file); Wget has an `aprintf' utility function that allocates the result on the heap. Avoids both buffer overruns and arbitrary limits on file name length. -- Rich wealthychef Cook 925-784-3077 -- it takes many small steps to climb a mountain, but the view gets better all the time.
Re: bug and patch: blank spaces in filenames causes looping
On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. -- ✐There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com 925-784-3077 -- ✐
RE: bug and patch: blank spaces in filenames causes looping
Please remove me from this list. thanks, John Bruso From: Rich Cook [mailto:[EMAIL PROTECTED] Sent: Thu 7/5/2007 12:30 PM To: Hrvoje Niksic Cc: Tony Lewis; [EMAIL PROTECTED] Subject: Re: bug and patch: blank spaces in filenames causes looping On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. -- ?There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com http://5pmharmony.com/ 925-784-3077 -- ?
Re: bug and patch: blank spaces in filenames causes looping
Rich Cook [EMAIL PROTECTED] writes: On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. No problem. Note that xmalloc isn't entirely specific to Wget, it's a fairly standard GNU name for a malloc-or-die function. Now I remembered that Wget also has xfree, so the above advice is not entirely correct -- you should call xfree instead. However, in the normal case xfree is a simple wrapper around free, so even if you used free, it would have worked just as well. (The point of xfree is that if you compile with DEBUG_MALLOC, you get a version that check for leaks, although it should be removed now that there is valgrind, which does the same job much better. There is also the business of barfing on NULL pointers, which should also be removed.) I'd have implemented a portable asprintf, but I liked the aprintf interface better (I first saw it in libcurl).
Re: bug and patch: blank spaces in filenames causes looping
So forgive me for a newbie-never-even-lurked kind of question: will this fix make it into wget for other users (and for me in the future)? Or do I need to do more to make that happen, or...? Thanks! On Jul 5, 2007, at 12:52 PM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote: Rich Cook [EMAIL PROTECTED] writes: Trouble is, it's undocumented as to how to free the resulting string. Do I call free on it? Yes. Freshly allocated with malloc in the function documentation was supposed to indicate how to free the string. Oh, I looked in the source and there was this xmalloc thing that didn't show up in my man pages, so I punted. Sorry. No problem. Note that xmalloc isn't entirely specific to Wget, it's a fairly standard GNU name for a malloc-or-die function. Now I remembered that Wget also has xfree, so the above advice is not entirely correct -- you should call xfree instead. However, in the normal case xfree is a simple wrapper around free, so even if you used free, it would have worked just as well. (The point of xfree is that if you compile with DEBUG_MALLOC, you get a version that check for leaks, although it should be removed now that there is valgrind, which does the same job much better. There is also the business of barfing on NULL pointers, which should also be removed.) I'd have implemented a portable asprintf, but I liked the aprintf interface better (I first saw it in libcurl). -- ✐There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com 925-784-3077 -- ✐
Re: bug and patch: blank spaces in filenames causes looping
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: So forgive me for a newbie-never-even-lurked kind of question: will this fix make it into wget for other users (and for me in the future)? Or do I need to do more to make that happen, or...? Thanks! Well, I need a chance to look over the patch, run some tests, etc, to see if it really covers everything it should (what about other, non-space characters?). The fix (or one like it) will probably make it into Wget at some point, but I wouldn't expect it to come out in the next release (which, itself, will not be arriving for a couple months); it will probably go into wget 1.12. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o oWDlelFyfvvTlhtlDpLYLXM= =DZ8v -END PGP SIGNATURE-
Re: bug and patch: blank spaces in filenames causes looping
Thanks for the follow up. :-) On Jul 5, 2007, at 3:52 PM, Micah Cowan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Rich Cook wrote: So forgive me for a newbie-never-even-lurked kind of question: will this fix make it into wget for other users (and for me in the future)? Or do I need to do more to make that happen, or...? Thanks! Well, I need a chance to look over the patch, run some tests, etc, to see if it really covers everything it should (what about other, non-space characters?). The fix (or one like it) will probably make it into Wget at some point, but I wouldn't expect it to come out in the next release (which, itself, will not be arriving for a couple months); it will probably go into wget 1.12. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o oWDlelFyfvvTlhtlDpLYLXM= =DZ8v -END PGP SIGNATURE- -- ✐There's no time to stop for gas, we're already late-- Karin Donker -- Rich wealthychef Cook http://5pmharmony.com 925-784-3077 -- ✐
Bug in the generated manpage
Hello, using Wget 1.10.2 I noticed that the man page description for --no-proxy says: For more information about the use of proxies with Wget, ... and that's all. The original contains an @xref, which gets swallowed by texi2pod. I don't know how/if it should be repaired, but I thought it's worth reporting. Have a nice day, Stepan
Re: bug storing cookies with wget
Mario Ander schrieb: Hi everybody, I think there is a bug storing cookies with wget. See this command line: C:\Programme\wget\wget --user-agent=Opera/8.5 (X11; U; en) --no-check-certificate --keep-session-cookies --save-cookies=cookie.txt --output-document=- --debug --output-file=debug.txt --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0 https://www.vodafone.de/proxy42/portal/login.po; [..] Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE; path=/jsp Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de; expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338; path=/proxy42 [..] ---response end--- 200 OK Attempt to fake the path: /jsp, /proxy42/portal/login.po So the problem seems to be that wget rejects cookies for paths which don't fit to the request url. Like the script you call is in /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts those cookies, but wich is not related to /jsp So it seems to be wget sticking to the strict RFC and the script doing wrong. To get this working you would need to patch wget for not RFC-compliant cookies maybe along with an --accept-malformed-cookies directiv. Hope this helps you Matthias
Re: bug storing cookies with wget
Matthias Vill schrieb: Mario Ander schrieb: Hi everybody, I think there is a bug storing cookies with wget. See this command line: C:\Programme\wget\wget --user-agent=Opera/8.5 (X11; U; en) --no-check-certificate --keep-session-cookies --save-cookies=cookie.txt --output-document=- --debug --output-file=debug.txt --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0 https://www.vodafone.de/proxy42/portal/login.po; [..] Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE; path=/jsp Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de; expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338; path=/proxy42 [..] ---response end--- 200 OK Attempt to fake the path: /jsp, /proxy42/portal/login.po So the problem seems to be that wget rejects cookies for paths which don't fit to the request url. Like the script you call is in /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts those cookies, but wich is not related to /jsp So it seems to be wget sticking to the strict RFC and the script doing wrong. To get this working you would need to patch wget for not RFC-compliant cookies maybe along with an --accept-malformed-cookies directiv. Hope this helps you Matthias So I thought of a second solution: If you have cygwin (or at least bash+grep) you can run this small script to dublicate and truncate the cookie. --- CUT here --- #!/bin/bash #Author: Matthias Vill; feel free to change and use #get the line for proxy42-path in $temp temp=$(grep proxy42 cookies.txt) #remove everything after last ! temp=${temp%!*} #replace proxy42 by jsp temp=${temp/proxy42/jsp} #append newline to file #echo cookies.txt #add new cookie to cookies.txt echo $tempcookies.txt --- CUT here --- Maybe you need to remove the # in front of echo cookies.txt to compensate a missing trailing newline; otherwise you may end up changing the value of the previous cookie. Maybe this helps even more Matthias
bug storing cookies with wget
Hi everybody, I think there is a bug storing cookies with wget. See this command line: C:\Programme\wget\wget --user-agent=Opera/8.5 (X11; U; en) --no-check-certificate --keep-session-cookies --save-cookies=cookie.txt --output-document=- --debug --output-file=debug.txt --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0 https://www.vodafone.de/proxy42/portal/login.po; wget answer this way: DEBUG output created by Wget 1.10.2 on Windows. --15:41:58-- https://www.vodafone.de/proxy42/portal/login.po = `-' Resolving www.vodafone.de... seconds 0.00, 139.7.147.41 Caching www.vodafone.de = 139.7.147.41 Connecting to www.vodafone.de|139.7.147.41|:443... seconds 0.00, connected. Created socket 1844. Releasing 0x003a5a90 (new refcount 1). Initiating SSL handshake. Handshake successful; connected socket 1844 to SSL handle 0x00931758 certificate: subject: /C=DE/ST=NRW/L=Duesseldorf/O=Vodafone D2 GmbH/OU=TOP-A/OU=Terms of use at www.verisign.com/rpa (c)00/CN=www.vodafone.de issuer: /O=VeriSign Trust Network/OU=VeriSign, Inc./OU=VeriSign International Server CA - Class 3/OU=www.verisign.com/CPS Incorp.by Ref. LIABILITY LTD.(c)97 VeriSign WARNING: Certificate verification error for www.vodafone.de: unable to get local issuer certificate ---request begin--- POST /proxy42/portal/login.po HTTP/1.0 User-Agent: Opera/8.5 (X11; U; en) Accept: */* Host: www.vodafone.de Connection: Keep-Alive Content-Type: application/x-www-form-urlencoded Content-Length: 77 ---request end--- [POST data: name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0] HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Date: Fri, 01 Jun 2007 13:41:56 GMT Server: Apache Pragma: No-cache Expires: Thu, 01 Jan 1970 00:00:00 GMT Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE; path=/jsp Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de; expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ Set-Cookie: JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338; path=/proxy42 Cache-Control: no-cache,no-store,max-age=0 P3P: CP=NOI ADM DEV PSAi COM NAV OUR OTR STP IND DEM Connection: close Content-Type: text/html; charset=ISO-8859-1 Via: 1.1 www.vodafone.de (Alteon iSD-SSL/6.0.5) ---response end--- 200 OK Attempt to fake the path: /jsp, /proxy42/portal/login.po cdm: 1 2 3 4 5 6 7 8 Stored cookie vodafone.de -1 (ANY) / permanent insecure [expiry 2007-06-01 17:05:16] VODAFONELOGIN 1 Stored cookie www.vodafone.de -1 (ANY) /proxy42 session insecure [expiry none] JSESSIONID GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338 Length: unspecified [text/html] 0K .. .. .. ... 338.67 KB/s Closed 1844/SSL 0x931758 15:41:58 (338.67 KB/s) - `-' saved [34644] Saving cookies to cookie.txt. Done saving cookies. The cookie.txt looks this way: # HTTP cookie file. # Generated by Wget on 2007-06-01 15:33:23. # Edit at your own risk. www.vodafone.de FALSE /proxy42FALSE 0 JSESSIONID GggBMfxV9vGqGwtyQGJFXsyCr6vQvGSh9KGgDt7xgLycdc5MTQps!1467361027!NONE!1180704801023 .vodafone.deTRUE/ FALSE 1180709801 VODAFONELOGIN 1 and should look like this (but does not): # HTTP cookie file. # Generated by Wget on 2007-06-01 15:47:31. # Edit at your own risk. www.vodafone.de FALSE /proxy42FALSE 0 JSESSIONID GgjRT1NTfspwH1cJCVPlGv37c4JKgkTDPYJNsTM2l1RJG0CJQ8Rp!-249032648!NONE!1180705649205 www.vodafone.de FALSE /jspFALSE 0 JSESSIONID GgjRT1NTfspwH1cJCVPlGv37c4JKgkTDPYJNsTM2l1RJG0CJQ8Rp!-249032648!NONE .vodafone.deTRUE/ FALSE 1180710649 VODAFONELOGIN 1 Thats all. Bye. Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
possible bug in wget-1.10.2 and earlier
Hi, wget appears to be confused by FTP servers that only have one space between the file-size information. We only came across this problem today so I don't know how common it is. pjjH From: Harrington, Paul Sent: Thursday, May 31, 2007 12:06 AM To: recipient-removed Subject: RE: File issue using WGET Your FTP server must have changed the output of the listing format or, more precisely, the string representation of some of the components has changed such that only one space separates the group name from the file-size. The bug is, of course, with wget but it is one that hitherto had not been observed when interacting with your FTP server. pjjH [EMAIL PROTECTED] diff -u ftp-ls.c ~/tmp --- ftp-ls.c2005-08-04 17:52:33.0 -0400 +++ /u/harringp/tmp/ftp-ls.c2007-05-31 00:02:07.209955000 -0400 @@ -229,6 +229,18 @@ break; } errno = 0; + /* after the while loop terminates, t may not always + point to a space character. In the case when + there is only one-space between the user/group + information and the file-size, the space will + have been overwritten by a \0 via strok(). So, + if you have been through the loop at least once, + advance forward one chacter. + */ + + if (t ptok) + t++; + size = str_to_wgint (t, NULL, 10); if (size == WGINT_MAX errno == ERANGE) /* Out of range -- ignore the size. Should
RE: wget bug
Highlord Ares wrote: it tries to download web pages named similar to http://site.com?variable=yesmode=awesome http://site.com?variable=yesmode=awesome Since is a reserved character in many command shells, you need to quote the URL on the command line: wget http://site.com?variable=yesmode=awesome http://site.com?variable=yesmode=awesome; Tony
wget bug
when I run wget on a certain sites, it tries to download web pages named similar to http://site.com?variable=yesmode=awesome. However, wget isn't saving any of these files, no doubt because of some file naming issue? this problem exists in both the Windows unix versions. hope this helps
RE: wget bug
This does not look like a valid URL to me - shouldn't there be a slash at the end of the domain name? Also, when talking about a bug (or anything else), it is always helpful if you specify the wget version (number). From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Highlord Ares Sent: Thursday, May 24, 2007 11:41 To: [EMAIL PROTECTED] Subject: wget bug when I run wget on a certain sites, it tries to download web pages named similar to http://site.com?variable=yesmode=awesome. However, wget isn't saving any of these files, no doubt because of some file naming issue? this problem exists in both the Windows unix versions. hope this helps
Bug using recursive get and stdout
Greetings, Stumbled across a bug yesterday reproduced in both v1.8.2 and 1.10.2. Apparently, recursive get tries to open the file for reading after downloading, to download subsequent files. Problem is, when used with -O - to deliver to stdout, it cannot open that file, so you get the output below (note the No such file or directory error). In 1.10, it appears that they removed this error message, but wget still fails to recursively fetch. I realize it seems like there wouldn't be much reason to send more than one page to stdout, but I'm feeding it all into a statistical filter to classify website data, so it doesn't really matter to the filter. Do you know of any workaround for this, other than opening the files after reading (won't scale with thousands per minute). Thanks! $ wget -O - -r http://www.zdziarski.com out --15:40:06-- http://www.zdziarski.com/ = `-' Resolving www.zdziarski.com... done. Connecting to www.zdziarski.com[209.51.159.242]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 24,275 [text/html] 100%[] 24,275 163.49K/s ETA 00:00 15:40:06 (163.49 KB/s) - `-' saved [24275/24275] www.zdziarski.com/index.html: No such file or directory FINISHED --15:40:06-- Downloaded: 24,275 bytes in 1 files Jonathan
Re: Bug using recursive get and stdout
A quick search at http://www.mail-archive.com/wget@sunsite.dk/; for -O found: http://www.mail-archive.com/wget@sunsite.dk/msg08746.html http://www.mail-archive.com/wget@sunsite.dk/msg08748.html The way -O is implemented, there are all kinds of things which are incompatible with it, -r among them. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: FW: think you have a bug in CSS processing
Neil wrote: When giving it some thought I think a valid argument could be made that the string in the CSS document is not exactly an URL but it is certainly URL-like. The URL-like strings in CSS are actually standard URLs, either absolute or relative, so they shouldn't be a big deal to handle. A caveat for the parser: they can be quoted or unquoted and still work. See http://www.w3.org/TR/CSS21/syndata.html#uri Amazingly I found this feature request in a 2003 message to this very mailing list. Are there only a few lunatics like me who think this should be included? Cheers, JFG
RE: FW: think you have a bug in CSS processing
J.F.Groff wrote: Amazingly I found this feature request in a 2003 message to this very mailing list. Are there only a few lunatics like me who think this should be included? Wget is written and maintained by volunteers. What you need to find is a lunatic willing to volunteer to write the code to support this feature request. Tony
Re: FW: think you have a bug in CSS processing
Hi Tony, Amazingly I found this feature request in a 2003 message to this very mailing list. Are there only a few lunatics like me who think this should be included? Wget is written and maintained by volunteers. What you need to find is a lunatic willing to volunteer to write the code to support this feature request. Heh, sure ! I'm lunatic enough to try... Fetching the code from svn as I write this. But the docs page says: At the moment the GNU Wget development tree has been split in two branches in order to allow bugfixing releases of the feature-frozen 1.10.x tree while continuing the development for Wget 2.0 on the main branch. Anywhere I can look at planned features for the 2.0 branch? There's an awful lot of items in the project's TODO list but no mention of CSS. Shall I just add the feature request to the TODO first, or is there a community process involved in picking candidate features? Cheers, JFG
Re: FW: think you have a bug in CSS processing
Oh wait. Somebody already did the patch! http://www.mail-archive.com/[EMAIL PROTECTED]/msg09502.html http://article.gmane.org/gmane.comp.web.wget.patches/1867 I guess it's up to maintainers to decide whether to include this in the standard wget distribution. In the meantime, hearty thanks to Ted Mielczarek, you made my day! JFG On 4/13/07, J.F. Groff [EMAIL PROTECTED] wrote: Hi Tony, Amazingly I found this feature request in a 2003 message to this very mailing list. Are there only a few lunatics like me who think this should be included? Wget is written and maintained by volunteers. What you need to find is a lunatic willing to volunteer to write the code to support this feature request. Heh, sure ! I'm lunatic enough to try... Fetching the code from svn as I write this. But the docs page says: At the moment the GNU Wget development tree has been split in two branches in order to allow bugfixing releases of the feature-frozen 1.10.x tree while continuing the development for Wget 2.0 on the main branch. Anywhere I can look at planned features for the 2.0 branch? There's an awful lot of items in the project's TODO list but no mention of CSS. Shall I just add the feature request to the TODO first, or is there a community process involved in picking candidate features? Cheers, JFG
Bug-report: wget with multiple cnames in ssl certificate
Hi If i connect with wget 1.10.2 (Debian Etch Ubuntu Feisty Fawn) to a secure host, that uses multiple cnames in the certificate i get the following error: [EMAIL PROTECTED]:~$ wget https://host.domain.tld --10:18:55-- https://host.domain.tld/ = `index.html' Resolving host.domain.tld... xxx.xxx.xxx.xxx Connecting to host.domain.tld|xxx.xxx.xxx.xxx|:443... connected. ERROR: certificate common name `host0.domain.tld' doesn't match requested host name `host.domain.tld'. To connect to host.domain.tld insecurely, use `--no-check-certificate'. Unable to establish SSL connection. If I do the same with wget 1.9.1 (Debian Sarge) I do not get that Error. Kind regards, Alex Antener -- Alex Antener Dipl. Medienkuenstler FH [EMAIL PROTECTED] // http://lix.cc // +41 (0)44 586 97 63 GPG Key: 1024D/14D3C7A1 https://lix.cc/gpg_key.php Fingerprint: BAB6 E61B 17D7 A9C9 6313 5141 3A3C DAA3 14D3 C7A1
think you have a bug in CSS processing
I think I found a bug in CSS processing. This was auto-generated and I'm far from a CSS expert (quite the opposite). But, as far as I can tell (see snippet below), it is supposed to be loaded from a directory named - that is off of the main URL. For example, if the origination site is http://www.foo.com, the GIF will be at http://www.foo.com/-/includes/styles/swirl/skin_swirl_grey_top.gif. The below text is came from the converted HTML file on the destination site. You'll notice that the URL was not converted to an absolute URL pointing to www.foo.com but neither was the GIF copied to the destination site. I've done a find and it is nowhere to be found. This really isn't a big deal for me as it is only one file and I've just manually copied it over, but it does seem to be a bug worthy of fixing. If you need more data, you can look at www.smithline.net. The snippet comes from that page which was created using google page creator (don't ask me why - it is definitely far from being ready for prime time) and then wget'ed over to smithline.net. Feel free to ping me should you need more info - Neil PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/home/neils/bin wget --mirror --force-html --convert-links --no-parent --directory-prefix=/home/neils/smithline.net/data --quiet --recursive --no-host-directories http://www.smithline.net-a.googlepages.com #container { padding: 0px; background:URL(/-/includes/style/swirl/skin_swirl_grey_top.gif) no-repeat top left; background-color:#dfdfdf; margin:0px auto; }
Re: wget-1.10.2 pwd/cd bug
Hrvoje Niksic [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Steven M. Schweda) writes: It's starting to look like a consensus. A Google search for: wget DONE_CWD finds: http://www.mail-archive.com/wget@sunsite.dk/msg08741.html That bug is fixed in subversion, revision 2194. I forgot to add that this means that the patch can be retrieved with `svn diff -r2193:2194' in Wget's source tree. If you don't have a checkout handy, Subversion still allows you to generate a diff using `svn diff -r2193:2194 http://svn.dotsrc.org/repo/wget/trunk/'. Also note that the fix is also available on the stable branch, and I urge the distributors to apply it to their versions until 1.10.3 or 1.11 is released.
Re: wget-1.10.2 pwd/cd bug
[EMAIL PROTECTED] (Steven M. Schweda) writes: It's starting to look like a consensus. A Google search for: wget DONE_CWD finds: http://www.mail-archive.com/wget@sunsite.dk/msg08741.html That bug is fixed in subversion, revision 2194.
wget-1.10.2-5mdv2007.1 pwd/cd bug
Hello, If wget cannot connect to the FTP server the first time, it fails to CD properly after checking the path with PWD. Here is a -d listing when connecting after failing. Thanks! Jason $cmd = wget -d --limit-rate=999k --tries=0 --no-remove-listing -N $ftp/*.rpm; --11:06:12-- ftp://ftp:[EMAIL PROTECTED]/pub/linux/distributions/mandrivalinux/devel/cooker/i586/media/main/release/*.rpm (try: 2) = `.listing' Found carroll.aset.psu.edu in host_name_addresses_map (0x808bf98) Connecting to carroll.aset.psu.edu|128.118.2.96|:21... connected. Created socket 3. Releasing 0x0808bf98 (new refcount 1). Logging in as ftp ... 220- snip big login message -- USER ftp 331 Please specify the password. -- PASS [EMAIL PROTECTED] 230 Login successful. Logged in! == SYST ... -- SYST 215 UNIX Type: L8 done.== PWD ... -- PWD 257 / done. == TYPE I ... -- TYPE I 200 Switching to Binary mode. done. == CWD not required. conaddr is: 128.118.2.96 == PASV ... -- PASV 227 Entering Passive Mode (128,118,2,96,184,134) trying to connect to 128.118.2.96 port 47238 Created socket 4. done.== LIST ... -- LIST 150 Here comes the directory listing. done. [ = ] 331 --.--K/s Closed fd 4 226 Directory send OK. 11:11:23 (412.30 KB/s) - `.listing' saved [331] DIRECTORY; perms 700; month: Sep; day: 8; year: 2005 (no tm); DIRECTORY; perms 700; month: Sep; day: 23; year: 2005 (no tm); DIRECTORY; perms 755; month: May; day: 24; year: 2006 (no tm); PLAINFILE; perms 644; month: Sep; day: 9; year: 2005 (no tm); PLAINFILE; perms 644; month: Sep; day: 9; year: 2005 (no tm); No matches on pattern `*.rpm'. Closed fd 3 _ Get a FREE Web site, company branded e-mail and more from Microsoft Office Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/
wget-1.10.2 pwd/cd bug
I downloaded 1.10.2 source code. u-cmd goes from 0x1B to 0x19, dropping DO_CMD on the second call to ftp.c:getftp() after connection failure. I'm trying to debug THE loop. Jason _ Watch free concerts with Pink, Rod Stewart, Oasis and more. Visit MSN Presents today. http://music.msn.com/presents?icid=ncmsnpresentstaglineocid=T002MSN03A07001
wget-1.10.2 pwd/cd bug
This is inverted in ftp.c: if (con-csock != -1) con-st = ~DONE_CWD; else con-st |= DONE_CWD; If not error, request cwd? If error, cwd done? It's backwards. Changing != to == solves the bug. Thanks! Jason _ 5.5%* 30 year fixed mortgage rate. Good credit refinance. Up to 5 free quotes - *Terms https://www2.nextag.com/goto.jsp?product=10035url=%2fst.jsptm=ysearch=mortgage_text_links_88_h2a5ds=4056p=5117disc=yvers=910
wget-1.10.2 pwd/cd bug
It's starting to look like a consensus. A Google search for: wget DONE_CWD finds: http://www.mail-archive.com/wget@sunsite.dk/msg08741.html Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: file numbering bug
From: Robert Dick When serializing sucessive copies of a page, the serial number appears at the end of the extension, i.e, what should be file1.html is called file.html.1 I'm using wget ver. 1.10.2. with the default options on Windows ME ... I can see how that might annoy a Windows user, but it would probably be a terrible idea to change the file name as you suggest, because it would break any HTML links to file.html which might appear in any other file. If you don't like the .nnn suffix, then you'll need to clean it up later, or else don't download the same file twice into the same directory. (Or you could use VMS, where file version numbers are a natural part of the file system, so the .nnn suffix is not needed, and this problem does not arise.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
RE: file numbering bug
It wouldn't break on windoze because file.html still exists. He just wants a different naming schema for the newer copies. There would be no links to file.html.1 or file 1.html for that matter, so it really doesn't matter which way you rename it. Although if there is a file called file 1.html and you downloaded it again, using your NEW schema, it would become file 11.html, which would be somewhat confusing :) Ranjit Sandhu 703.803.1755 SRA -Original Message- From: Steven M. Schweda [mailto:[EMAIL PROTECTED] Sent: Thursday, March 08, 2007 11:50 AM To: WGET@sunsite.dk Cc: [EMAIL PROTECTED] Subject: Re: file numbering bug From: Robert Dick When serializing sucessive copies of a page, the serial number appears at the end of the extension, i.e, what should be file1.html is called file.html.1 I'm using wget ver. 1.10.2. with the default options on Windows ME ... I can see how that might annoy a Windows user, but it would probably be a terrible idea to change the file name as you suggest, because it would break any HTML links to file.html which might appear in any other file. If you don't like the .nnn suffix, then you'll need to clean it up later, or else don't download the same file twice into the same directory. (Or you could use VMS, where file version numbers are a natural part of the file system, so the .nnn suffix is not needed, and this problem does not arise.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
ntlm already authenticated bug and fix.
Hi Mauro (I'm guessing here - got this from the web page) Here is a patch against 1.10.2 which fixes an issue I found when using NTLM with Microsoft's Intermittent Information Server (IIS). The issue is not with wget, but rather a bug in IIS. Nevertheless, here is the fix and a description of the problem. Essentially IIS has the ability to create domains for want of a better description (I'm not an IIS expert by any means) within a single instance of the IIS server. Each of these domains (I understand) is more or less independent. The bug manifests itself when a page within one domain links to a page within another domain on the same IIS instance. The web address of the server remains the same except the URI points to some other directory under the server's root. In this case, when the connection is first setup by wget, NTLM authenticates correctly. Subsequent recursive gets also work fine *until* a reference is made to another domain. When the cross domain reference occurs IIS issues another NTLM challenge, as if the connection is not authenticated. Now, as you and I know, NTLM is a connection authentication protocol, meaning you cannot be connected unless you are authenticated. So IIS's other domains already know the connection is authenticated because it *is* a connection, nevertheless, they insist on re-authentication. This patch addresses the issue by forcing a disconnect and retry when this circumstance is detected (Actually, this always disconnects in this rev. The detection bit needs more work). That is to say, if an NTLM challenge occurs when the connection is already active *and* NTLM authenticated, the connection is terminated and restarted (thus invoking the challenge-response code) and ultimately re-authenticating. This work is the result of many hours of work and extensive network debugging with the help of an Australian law enforcement agency. --- wget-1.10.2.orig/src/http.c 2005-08-09 08:54:16.0 +1000 +++ wget-1.10.2/src/http.c 2006-11-21 12:25:22.0 +1100 @@ -1960,10 +1960,12 @@ hs-restval, hs-rd_size, hs-len, hs-dltime, flags); +/* if (hs-res = 0) CLOSE_FINISH (sock); else -CLOSE_INVALIDATE (sock); +*/ + CLOSE_INVALIDATE (sock); { /* Close or flush the file. We have to be careful to check for Cheers Phill. P.S. the work was done last year and I'm finally cleaning up the loose ends. Hope this helps. Phill Bertolus Technical Director Web Wombat Pty. Ltd. Ph: +61-3-9675-0900 (Switch) Ph: +61-3-9675-0901 (Direct) Mb: +61-4-1632-6853 Fx: +61-3-9675-0999
Re: wget-1.10.2 cookie expiry bug
Thanks for the report and the (correct) analysis. This patch fixes the problem in the trunk. 2007-01-23 Hrvoje Niksic [EMAIL PROTECTED] * cookies.c (parse_set_cookie): Would erroneously discard cookies with unparsable expiry time. Index: src/cookies.c === --- src/cookies.c (revision 2202) +++ src/cookies.c (working copy) @@ -390,17 +390,16 @@ { cookie-permanent = 1; cookie-expiry_time = expires; + /* According to netscape's specification, expiry time in +the past means that discarding of a matching cookie +is requested. */ + if (cookie-expiry_time cookies_now) + cookie-discard_requested = 1; } else /* Error in expiration spec. Assume default (cookie doesn't expire, but valid only for this session.) */ ; - - /* According to netscape's specification, expiry time in the -past means that discarding of a matching cookie is -requested. */ - if (cookie-expiry_time cookies_now) - cookie-discard_requested = 1; } else if (TOKEN_IS (name, max-age)) {
wget-1.10.2 cookie expiry bug
(Resend as I've received no reply to the original message.) Kind wget maintainers, I believe I found a bug in the wget cookie expiry handling. Recently I was using wget receiving back a cookie with an expiration of Sun, 20-Sep-2043 19:37:28 GMT. This fits inside a 32-bit unsigned long but unfortunately overflows a 32-bit signed long by about 4 years. It would appear that timegm (called from http_atotm) returns -1 when it overflows. At least that was the behavior I observed with my system's timegm (OS X 10.4.8/i386) and the timegm that ships with wget (I recompiled using the wget timegm function to test). Looking at cookies.c, the intent seems to be to treat a (time_t) -1 as a session cookie. If this is the case, there is a bug in the logic which instead causes wget to discard the cookie entirely: expires = http_atotm (value_copy); if (expires != (time_t) -1) { cookie-permanent = 1; cookie-expiry_time = expires; } else /* Error in expiration spec. Assume default (cookie doesn't expire, but valid only for this session.) */ ; /* According to netscape's specification, expiry time in the past means that discarding of a matching cookie is requested. */ if (cookie-expiry_time cookies_now) cookie-discard_requested = 1; The problem is that when http_atotm returns -1, cookie-expiry_time does not get set, defaulting to 0 (I think). That then causes the cookie to be discarded. I've attached the world's smallest patch which corrects this behavior to what I believe the comments intended. Thanks, j. wget-1.10.2.cookie_expiry.patch Description: Binary data
Possibly bug
Hi, Have been downloading slackware-11.0-install-dvd.iso, but It seems wget downloaded more then filesize and I found: -445900K .. .. .. .. ..119% 18.53 KB/s in wget-log. Regards, Yuriy Padlyak
Re: Possibly bug
The file was probably being uploaded when you started downloading it, so the HTTP server continued sending data even over the initially reported filesize. Just stop wget, and start it again with option -c to resume download. MT Le mercredi 17 janvier 2007 à 18:16 +0200, Yuriy Padlyak a écrit : Hi, Have been downloading slackware-11.0-install-dvd.iso, but It seems wget downloaded more then filesize and I found: -445900K .. .. .. .. ..119% 18.53 KB/s in wget-log. Regards, Yuriy Padlyak
Re: Possibly bug
From: Yuriy Padlyak Have been downloading slackware-11.0-install-dvd.iso, but It seems wget downloaded more then filesize and I found: -445900K .. .. .. .. ..119% 18.53 KB/s in wget-log. As usual, it would help if you provided some basic information. Which wget version (wget -V)? On which system type? OS and version? Guesswork follows. Wget versions before 1.10 did not support large files, and a DVD image could easily exceed 2GB. Negative file sizes are a common symptom when using a small-file program with large files. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Bug in 1.10.2 vs 1.9.1
Juhana Sadeharju wrote: Hello. Wget 1.10.2 has the following bug compared to version 1.9.1. First, the bin/wgetdir is defined as wget -p -E -k --proxy=off -e robots=off --passive-ftp -o zlogwget`date +%Y%m%d%H%M%S` -r -l 0 -np -U Mozilla --tries=50 --waitretry=10 $@ The download command is wgetdir http://udn.epicgames.com Version 1.9.1 result: download ok Version 1.10.2 result: only udn.epicgames.com/Main/WebHome downloaded and other converted urls are of the form http://udn.epicgames.com/../Two/WebHome hi juhana, could you please try the current version of wget from our subversion repository: http://www.gnu.org/software/wget/wgetdev.html#development ? this bug should be fixed in the new code. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it