Re: --mirror and --cut-dirs=2 bug?

2008-11-03 Thread Brock Murch
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah,

Many thanks with all your very timely help. I have had no issues since 
following you instructions to upgrade to 1.11.4 and installing it in the /opt 
directory. I used:

$ ./configure --prefix=/opt/wget

And point to ist specifically:

/opt/wget/bin/wget  --tries=10 -r -N -l inf --wait=1\
-nH --cut-dirs=2 ftp://oceans.gsfc.nasa.gov/MODISA/ATTEPH/ \
-o /home1/software/modis/atteph/mirror_a.log \
--directory-prefix=/home1/software/modis/atteph

Thanks again.

Brock


On Monday 27 October 2008 3:06 pm, Micah Cowan wrote:
 Brock Murch wrote:
  Sorry, 1 quick question? Do you know of anyone providing rpm's of 1.11.4
  for CentOS?

 Not offhand. It may not yet be available; it was only packaged for
 Fedora Core a couple months ago, I think. RPMfind.net just lists 1.11.4
 sources for fc9 and fc10.

  If not, would you recommend uninstalling the current one? Before
  installing from your src? Many thanks.

 I'd advise against that: I believe various important components of Red
 Hat/CentOS rely on wget to fetch things. Sometimes minor changes in the
 output/interface of wget cause problems for automated scripts that form
 an integral part of an operating system. Though really, I think most of
 the changes that would pose such a danger are actually already in the
 Red Hat modified 1.10.2 sources (taken from the development sources
 for what was later released as 1.11).

 What I tend to do on my systems, is to configure the sources like:

   $ ./configure --prefix=$HOME/opt/wget

 and then either add $HOME/opt to my $PATH, or invoke it directly as
 $HOME/opt/wget/bin/wget.

 Note that if you want to build wget with support for HTTPS, you'll need
 to have the development package for openssl installed.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFJDwveMAkzD2qY/pURAmvuAJ9XG784Djq0mwcTu/nN56tPSM+AMQCgm2KX
dzPQ263FF7Gaw4qtE1X0wTI=
=CC9T
-END PGP SIGNATURE-



Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brock Murch wrote:
 I try to keep a mirror of NASA atteph ancilliary data for modis processing. I 
 know that means little, but I have a cron script that runs 2 times a day. 
 Sometimes it works, and others, not so much. The sh script is listed at the 
 end of this email below. As is the contents of the remote ftp server's root 
 and portions fo the log. 
 
 I don't need all the data on the remote server, only some thus I use 
 --cut-dirs.To make matters stranger, the software (also from NASA) that uses 
 these files, looks for them in a single place on the client machine where the 
 software runs, but needs data from 2 different directories on the remote ftp 
 server. If the data is not on the client machine, the software kindly ftp's 
 the files to the local directory. However, I don't allow write access to that 
 directory as many people use the software and when it is d/l'ed it has the 
 wrong perms for others to use it, thus I mirror the data I need from the ftp 
 site locally. In the script below, there are 2 wget commands, but they are to 
 slightly different directories (MODISA  MODIST).

I wouldn't recommend that. Using the same output directory for two
different source directories seems likely to lead to problems. You'd
most likely be better off by pulling to two locations, and then
combining them afterwards.

I don't know for sure that it _will_ cause problems (except if they
happen to have same-named files), as long as .listing files are being
properly removed (there were some recently-fixed bugs related to that, I
think? ...just appending new listings on top of existing files).

 It appears to me that the problem occurs if there is a ftp server error, and 
 wget starts a retry. wget goes to the server root, gets the .listing from 
 there for some reason (as opposed to the directory it should go to on the 
 server), and then goes to the dir it needs to mirror and can't find the files 
 (that are listed in the root dir) and creates dirs, and then I get No such 
 file errors and recursive directories created. Any advice would be 
 appreciated.

This snippet seems to be the source of the problem:

 Error in server response, closing control connection.
 Retrying.
 
 - --14:53:53--  ftp://oceans.gsfc.nasa.gov/MODIST/ATTEPH/2002/110/
   (try: 2) = `/home1/software/modis/atteph/2002/110/.listing'
 Connecting to oceans.gsfc.nasa.gov|169.154.128.45|:21... connected.
 Logging in as anonymous ... Logged in!
 == SYST ... done.== PWD ... done.
 == TYPE I ... done.  == CWD not required.
 == PASV ... done.== LIST ... done.

That CWD not required bit is erroneous. I'm 90% sure we fixed this
issue recently (though I'm not 100% sure that it went to release: I
believe so).

I believe we made some related fixes more recently. You provided a great
amount of useful information, but one thing that seems to be missing (or
I missed it) is the Wget version number. Judging from the log, I'd say
it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
you please try to verify whether Wget continues to exhibit this problem
in the latest release version?

I'll also try to look into this as I have time (but it might be awhile
before I can give it some serious attention; it'd be very helpful if you
could do a little more legwork).

- --
Thanks very much,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBgNh7M8hyUobTrERAuGoAKCCUoBN0sURKA/51x0o4HN59K8+AACfUYuj
i8XW58MvjvbS3oy4OsOmbpc=
=4kpD
-END PGP SIGNATURE-


Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah Cowan wrote:
 I believe we made some related fixes more recently. You provided a great
 amount of useful information, but one thing that seems to be missing (or
 I missed it) is the Wget version number. Judging from the log, I'd say
 it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
 you please try to verify whether Wget continues to exhibit this problem
 in the latest release version?

This problem looks like the one that Mike Grant fixed in October of
2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it
should definitely be fixed in 1.11.4. Please let me know if it isn't.

- --
Regards,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBgY+7M8hyUobTrERArrRAJ4p4Y7jwWfic0Wul7UBnBXlSzD2XQCePifc
kWs00JOULkzJmzozK7lmcfA=
=iSL3
-END PGP SIGNATURE-


Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Brock Murch
Micah,

Thanks for your quick attention to this. Yous, I probably forgot to include 
the version #

[EMAIL PROTECTED] atteph]# wget --version
GNU Wget 1.10.2 (Red Hat modified)

Copyright (C) 2005 Free Software Foundation, Inc.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Originally written by Hrvoje Niksic [EMAIL PROTECTED].

I will see if I can get the newest version for:
[EMAIL PROTECTED] atteph]# cat /etc/redhat-release
CentOS release 4.2 (Final)

I'll let you know how that goes.

Brock

On Monday 27 October 2008 2:19 pm, Micah Cowan wrote:
 Micah Cowan wrote:
  I believe we made some related fixes more recently. You provided a great
  amount of useful information, but one thing that seems to be missing (or
  I missed it) is the Wget version number. Judging from the log, I'd say
  it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
  you please try to verify whether Wget continues to exhibit this problem
  in the latest release version?

 This problem looks like the one that Mike Grant fixed in October of
 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it
 should definitely be fixed in 1.11.4. Please let me know if it isn't.



[bug] wrong speed calculation in (--output-file) logfile

2008-10-25 Thread Peter Volkov
Hello.

During download with wget I've redirected output into file with the
following command: 

$ LC_ALL=C wget -o output 
'ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz'

I've set LC_ALL and LANG explicitly to be sure that this is not locale
related problem. The output I saw in output file was:


--2008-10-25 14:51:17--  
ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz
   = `OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13'
Resolving mirror.yandex.ru... 77.88.19.68
Connecting to mirror.yandex.ru|77.88.19.68|:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD /gentoo-distfiles/distfiles ... done.
== SIZE OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... 13633213
== PASV ... done.== RETR 
OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... done.
Length: 13633213 (13M)

 0K .. .. .. .. ..  0%  131K 1m41s
50K .. .. .. .. ..  0%  132K 1m40s
   100K .. .. .. .. ..  1%  135K 99s
   150K .. .. .. .. ..  1%  132K 99s
   200K .. .. .. .. ..  1%  130K 99s
   250K .. .. .. .. ..  2% 45.9K 2m9s
   300K .. .. .. .. ..  2% 64.3M 1m50s
[snip]
 13250K .. .. .. .. .. 99%  131K 0s
 13300K .. ...100%  134K=1m41s

2008-10-25 14:52:58 (132 KB/s) - 
`OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13' saved [13633213]


Note the line above snip:
   300K ..  2% 64.3M 1m50s

This is impossible to download so much Mbytes as file is much less. I
don't know why sometimes this number jumps, but in some cases it cause
the following output at the end of download:

 13300K .. ...  100% 26101G=1m45s

Obviously I don't have possibility to download with such high
(26101G=1m45s) speed. This is reproducible with wget 1.11.4.

-- 
Peter.



--mirror and --cut-dirs=2 bug?

2008-10-24 Thread Brock Murch
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I try to keep a mirror of NASA atteph ancilliary data for modis processing. I 
know that means little, but I have a cron script that runs 2 times a day. 
Sometimes it works, and others, not so much. The sh script is listed at the 
end of this email below. As is the contents of the remote ftp server's root 
and portions fo the log. 

I don't need all the data on the remote server, only some thus I use 
- --cut-dirs. To make matters stranger, the software (also from NASA) that uses 
these files, looks for them in a single place on the client machine where the 
software runs, but needs data from 2 different directories on the remote ftp 
server. If the data is not on the client machine, the software kindly ftp's 
the files to the local directory. However, I don't allow write access to that 
directory as many people use the software and when it is d/l'ed it has the 
wrong perms for others to use it, thus I mirror the data I need from the ftp 
site locally. In the script below, there are 2 wget commands, but they are to 
slightly different directories (MODISA  MODIST).

It appears to me that the problem occurs if there is a ftp server error, and 
wget starts a retry. wget goes to the server root, gets the .listing from 
there for some reason (as opposed to the directory it should go to on the 
server), and then goes to the dir it needs to mirror and can't find the files 
(that are listed in the root dir) and creates dirs, and then I get No such 
file errors and recursive directories created. Any advice would be 
appreciated.

Brock Murch

Here is an example of the bad type of dir structure I end up with (there 
should be no EO1 and below):

[EMAIL PROTECTED] atteph]# find . -type d -name * | grep EO1
./2002/110/EO1
./2002/110/EO1/CZCS
./2002/110/EO1/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS

Or:
[EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/
CZCS  README
[EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/
COMMON
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/

And

[EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README 
ls: /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README: No 
such file or directory
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README


All the README files are all the same, and the same as the one is the ftp 
server 

Hello, All and bug #21793

2008-09-08 Thread David Coon
Hello everyone,

I thought I'd introduce myself to you all, as I intend to start helping out
with wget.  This will be my first time contributing to any kind of free or
open source software, so I may have some basic questions down the line about
best practices and such, though I'll try to keep that to a minimum.

Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try
to tackle bug #21793 https://savannah.gnu.org/bugs/?21793.

-David A Coon


Re: Hello, All and bug #21793

2008-09-08 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Coon wrote:
 Hello everyone,
 
 I thought I'd introduce myself to you all, as I intend to start helping
 out with wget.  This will be my first time contributing to any kind of
 free or open source software, so I may have some basic questions down
 the line about best practices and such, though I'll try to keep that to
 a minimum.
 
 Anyway, I've been researching unicode and utf-8 recently, so I'm gonna
 try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. 

Hi David, and welcome!

If you haven't already, please see
http://wget.addictivecode.org/HelpingWithWget

I'd encourage you to get a Savannah account, so I can assign that bug to
you. Also, I tend to hang out quite a bit on IRC (#wget @
irc.freenode.net), so you might want to sign on there.

Since you mentioned an interest in Unicode and UTF-8, you might want to
check out Saint Xavier's recent work on IRI and iDNS support in Wget,
which is available at http://hg.addictivecode.org/wget/sxav/.

Among other things, sxav's additions make Wget more aware of the user's
locale, so it might be useful for providing a feature to automatically
transcode filenames to the user's locale, rather than just supporting
UTF-8 only (which should still probably remain an explicit option). If
that sounds like the direction you'd like to take it, you should
probably base your work on sxav's repository, rather than mainline.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD
veVZAIH2NjbYI8dG6DimjRg=
=9Qau
-END PGP SIGNATURE-


Re: [BUG:#20329] If-Modified-Since support

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
 We need to give out the time stamp the local file in the Request
 header for that we need to pass on the local file's time stamp from
 http_loop() to get_http() . The only way to pass on this without
 altering the signature of the function is to add a field to struct url
 in url.h
 
 Could we go for it?

That is acceptable.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1
AFkIYSyyyS4egbyXjzBLXBo=
=fIT5
-END PGP SIGNATURE-


Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, that's what it means.

I'm not yet committed to doing this. I'd like to see first how many
mainstream servers will respect If-Modified-Since when given as part of
an HTTP/1.0 request (in comparison to how they respond when it's part of
an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not
in HTTP/1.1, that'd be an excellent case for holding off until we're
doing HTTP/1.1 requests.

Also, I don't think removing the previous HEAD request code is
entirely accurate: we probably would want to detect when a server is
feeding us non-new content in response to If-Modified-Since, and adjust
to use the current HEAD method instead as a fallback.

- -Micah

vinothkumar raman wrote:
 This mean we should remove the previous HEAD request code and use
 If-Modified-Since by default and have it to handle all the request and
 store pages if it is not returning a 304 response
 
 Is it so?
 
 
 On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 Follow-up Comment #4, bug #20329 (project wget):

 verbatim-mode's not all that readable.

 The gist is, we should go ahead and use If-Modified-Since, perhaps even now
 before there's true HTTP/1.1 support (provided it works in a reasonable
 percentage of cases); and just ensure that any Last-Modified header is sane.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2
8JiRBKtEhmcK3schVVO347A=
=yCJV
-END PGP SIGNATURE-


BUG : 202329 IF-MODIFIED-SINCE

2008-09-01 Thread vinothkumar raman
Hi all,

We need to give out the time stamp the local file in the Request
header for that we need to pass on the local file's time stamp from
http_loop() to get_http() . The only way to pass on this without
altering the signature of the function is to add a field to struct url
in url.h

Could we go for it?

Thanks,
VinothKumar.R


[BUG:#20329] If-Modified-Since support

2008-09-01 Thread vinothkumar raman
Hi all,

We need to give out the time stamp the local file in the Request
header for that we need to pass on the local file's time stamp from
http_loop() to get_http() . The only way to pass on this without
altering the signature of the function is to add a field to struct url
in url.h

Could we go for it?

Thanks,
VinothKumar.R


Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-01 Thread vinothkumar raman
This mean we should remove the previous HEAD request code and use
If-Modified-Since by default and have it to handle all the request and
store pages if it is not returning a 304 response

Is it so?


On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 Follow-up Comment #4, bug #20329 (project wget):

 verbatim-mode's not all that readable.

 The gist is, we should go ahead and use If-Modified-Since, perhaps even now
 before there's true HTTP/1.1 support (provided it works in a reasonable
 percentage of cases); and just ensure that any Last-Modified header is sane.

___

 Reply to this item at:

  http://savannah.gnu.org/bugs/?20329

 ___
  Message sent via/by Savannah
  http://savannah.gnu.org/




RE: wget-1.11.4 bug

2008-07-26 Thread kuang-cheng chao

Micah Cowan wrote: The thing is, though, those two threads should be running 
wgets under separate processes
 
Yes, the two threads are running wgets under seperate processes with system.
 What operating system are you running? Vista?mipsel-linux with kernel v2.4 
 built from gcc v3.3.5 
 
Best regards,
K.C. Chao
_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE

Re: wget-1.11.4 bug

2008-07-25 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

kuang-cheng chao wrote:
 Dear Micah:
  
 Thanks for your work of wget.
  
 There is a question about two wgets run simultaneously.
 In method resolve_bind_address, wget assumes that this is called once.
 However, this will cause two domain name with the same ip if two wgets
 run the same method concurrently.

Have you reproduced this, or is this in theory? If the latter, what has
led you to this conclusion? I don't see anything in the code that would
cause this behavior.

Also, please use the mailing list for discussions about Wget. I've added
it to the recipients list.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIiYKF7M8hyUobTrERAr7fAJ0TnkLdEVOMy6wJA3Z1kIYC7dQoMACfZ9hb
x5K6MTzhgVRCdKJwUGnbSRw=
=EcFC
-END PGP SIGNATURE-


RE: wget-1.11.4 bug

2008-07-25 Thread kuang-cheng chao

Micah Cowan wrote:
 Have you reproduced this, or is this in theory? If the latter, what has led 
 you to this conclusion? I don't see anything in the code that would cause 
 this behavior.
I reproduce this. But I can't make sure the really problem is in 
resolve_bind_address.
In the attached message, both api.yougotphogo.com and farm1.static.flickr.com 
get the same ip(74.124.203.218).
The two wget are called from two threads of a program.
 
Best regards,
k.c. chao
 
p.s. 
 
The log is follworing:
 
wget -4 -t 6 
http://api.yougotphoto.com/device/?action=get_device_new_photoapi=2.2api_key=f10df554a958fd10050e2d305241c7a3device_class=2serial_no=000E2EE5676Furl_no=24616cksn=44fe191d6cb4e7807f75938b5d72f07c;
 -O /tmp/webii/ygp_new_photo_list.txt--1999-11-30 00:04:21--  
http://api.yougotphoto.com/device/?action=get_device_new_photoapi=2.2api_key=f10df554a958fd10050e2d305241c7a3device_class=2serial_no=000E2EE5676Furl_no=24616cksn=44fe191d6cb4e7807f75938b5d72f07cResolving
 api.yougotphoto.com... wget -4 -t 6 
http://farm1.static.flickr.com/33/49038824_e4b04b7d9f_b.jpg; -O 
/tmp/webii/24616 74.124.203.218Connecting to 
api.yougotphoto.com|74.124.203.218|:80... --1999-11-30 00:04:22--  
http://farm1.static.flickr.com/33/49038824_e4b04b7d9f_b.jpgResolving 
farm1.static.flickr.com... 74.124.203.218Connecting to 
farm1.static.flickr.com|74.124.203.218|:80... connected. 
_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vistamkt=en-USform=QBRE

Re: wget-1.11.4 bug

2008-07-25 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

k.c. chao wrote:
 Micah Cowan wrote:
  Have you reproduced this, or is this in theory? If the latter, what has
  led you to this conclusion? I don't see anything in the code that would
  cause this behavior.

 I reproduce this. But I can't make sure the really problem is in
 resolve_bind_address. In the attached message, both
 api.yougotphogo.com and farm1.static.flickr.com get the same
 ip(74.124.203.218).  The two wget are called from two threads of a
 program.

Yeah, I get 68.142.213.135 for the flickr.com address, currently.

The thing is, though, those two threads should be running wgets under
separate processes (I'm not sure how they couldn't be, but if they
somehow weren't that would be using Wget other than how it was designed
to be used).

This problem sounds much more like an issue with the OS's API than an
issue with Wget, to me. But we'd still want to work around it if it were
feasible.

What operating system are you running? Vista?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIirT17M8hyUobTrERAjsuAJ0crMPYIQficu1csou8Tt0jDFKvpQCeNYk3
1FhXl3uUYj2IA53qI1oOJ8A=
=DvdG
-END PGP SIGNATURE-


Re: WGET bug...

2008-07-11 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

HARPREET SAWHNEY wrote:
 Hi,
 
 I am getting a strange bug when I use wget to download a binary file
 from a URL versus when I manually download.
 
 The attached ZIP file contains two files:
 
 05.upc --- manually downloaded
 dum.upc--- downloaded through wget
 
 wget adds a number of ascii characters to the head of the file and seems
 to delete a similar number from the tail.
 
 So the file sizes are the same but the addition and deletion renders
 the file useless.
 
 Could you please direct me on if I should be using some specific
 option to avoind this problem?

In the future, it's useful to mention which version of Wget you're using.

The problem you're having is that the server is adding the extra HTML at
the front of your session, and then giving you the file contents anyway.
It's a bug in the PHP code that serves the file.

You're getting this extra content because you are not logged in when
you're fetching it. You need to have Wget send a cookie with an
login-session information, and then the server will probably stop
sending the corrupting information at the head of the file. The site
does not appear to use HTTP's authentication mechanisms, so the
[EMAIL PROTECTED] bit in the URL doesn't do you any good. It uses
Forms-and-cookies authentication.

Hopefully, you're using a browser that stores its cookies in a text
format, or that is capable of exporting to a text format. In that case,
you can just ensure that you're logged in in your browser, and use the
- --load-cookies=cookies.txt option to Wget to use the same session
information.

Otherwise, you'll need to use --save-cookies with Wget to simulate the
login form post, which is tricky and requires some understanding of HTML
Forms.

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFId9Vy7M8hyUobTrERAjCWAJ9niSjC5YdBDNcAbnBFWZX6D8AO7gCeM8nE
i8jn5i5Y6wLX1g3Q2hlDgcM=
=uOke
-END PGP SIGNATURE-


Re: WGET bug...

2008-07-11 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

HARPREET SAWHNEY wrote:
 Hi,
 
 Thanks for the prompt response.
 
 I am using
 
 GNU Wget 1.10.2
 
 I tried a few things on your suggestion but the problem remains.
 
 1. I exported the cookies file in Internet Explorer and specified
 that in the Wget command line. But same error occurs.
 
 2. I have an open session on the site with my username and password.
 
 3. I also tried running wget while I am downloading a file from the
 IE session on the site, but the same error.

Sounds like you'll need to get the appropriate cookie by using Wget to
login to the website. This requires site-specific information from the
user-login form page, though, so I can't help you without that.

If you know how to read some HTML, then you can find the HTML form used
for posting username/password stuff, and use

wget --keep-session-cookies --save-cookies=cookies.txt \
- --post-data='username=foopassword=bar' ACTION

Where ACTION is the value of the form's action field, USERNAME and
PASSWORD (and possibly further required values) are field names from the
HTML form, and FOO and BAR is the username/password.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFId+w97M8hyUobTrERAmLsAJ91231iGeO/albrgRuuUCRp8zFcnwCgiX3H
fDp2J2oTBKlxW17eQ2jaCAA=
=Khmi
-END PGP SIGNATURE-


bug in wget

2008-06-14 Thread Sir Vision

Hello,

enterring following command results in an error:

--- command start ---
c:\Downloads\wget_v1.11.3bwget 
ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/;
 
-P c:\Downloads\
--- command end ---

wget cant convert .listing-file into a html-file

regards


_
Keine Mail mehr verpassen! Jetzt gibt’s Hotmail fürs Handy!
http://www.gowindowslive.com/minisites/mail/mobilemail.aspx?Locale=de-de

Re: bug in wget

2008-06-14 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sir Vision wrote:
 Hello,
 
 enterring following command results in an error:
 
 --- command start ---
 c:\Downloads\wget_v1.11.3bwget
 ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/;
 -P c:\Downloads\
 --- command end ---
 
 wget cant convert .listing-file into a html-file

As this seems to work fine on Unix, for me, I'll have to leave it to the
Windows porting guy (hi Chris!) to find out what might be going wrong.

...however, it would really help if you would supply the full output you
got, from wget, that leads you to believe Wget couldn't do this
conversion. in fact, it wouldn't hurt to supply the -d flag as well, for
maximum debugging messages.

- --
Cheers,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B
dz38DW8jMMZtUxc+FhvIhfI=
=T+mK
-END PGP SIGNATURE-


.listing bug when using -c

2008-04-03 Thread Karsten Hopp

wget-1.11.1 (and 1.10/1.10.1) don't handle the .listing file properly when -c 
is used.
It just appends to that file instead of replacing it which means that wget 
tries to download each
file twice when you run the same command twice. Have a look at this log:

wget -m -nd -c ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
--2008-04-03 15:30:17--  ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
   = `.listing'
Resolving ftp.redhat.com... 209.132.176.30
Connecting to ftp.redhat.com|209.132.176.30|:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD /pub/redhat/linux/rawhide ... done.
== PASV ... done.== LIST ... done.

[ =] 259 --.-K/s   in 0s

2008-04-03 15:30:19 (1.66 MB/s) - `.listing' saved [259]

Already have correct symlink .message - README

--2008-04-03 15:30:19--  ftp://ftp.redhat.com/pub/redhat/linux/rawhide/README
   = `README'
== CWD not required.
== PASV ... done.== RETR README ... done.
Length: 404

100%[===] 404 --.-K/s   in 0.007s

2008-04-03 15:30:21 (59.4 KB/s) - `README' saved [404]

FINISHED --2008-04-03 15:30:21--
Downloaded: 2 files, 663 in 0.007s (95.3 KB/s)

cat .listing
drwxr-xr-x2 ftp  ftp  4096 Nov 10  2003 .
drwxr-xr-x8 ftp  ftp  4096 May 15  2006 ..
lrwxrwxrwx1 ftp  ftp 6 Nov 10  2003 .message - README
-rw-r--r--1 ftp  ftp   404 Nov 10  2003 README

wget -m -nd -c ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
--2008-04-03 15:30:26--  ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
   = `.listing'
Resolving ftp.redhat.com... 209.132.176.30
Connecting to ftp.redhat.com|209.132.176.30|:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD /pub/redhat/linux/rawhide ... done.
== PASV ... done.== LIST ... done.

100%[++=] 518 --.-K/s   in 0s

2008-04-03 15:30:28 (2.36 MB/s) - `.listing' saved [518]

Already have correct symlink .message - README

Remote file no newer than local file `README' -- not retrieving.
Already have correct symlink .message - README

Remote file no newer than local file `README' -- not retrieving.
FINISHED --2008-04-03 15:30:28--
Downloaded: 1 files, 518 in 0s (4.73 MB/s)

cat .listing
drwxr-xr-x2 ftp  ftp  4096 Nov 10  2003 .
drwxr-xr-x8 ftp  ftp  4096 May 15  2006 ..
lrwxrwxrwx1 ftp  ftp 6 Nov 10  2003 .message - README
-rw-r--r--1 ftp  ftp   404 Nov 10  2003 README
drwxr-xr-x2 ftp  ftp  4096 Nov 10  2003 .
drwxr-xr-x8 ftp  ftp  4096 May 15  2006 ..
lrwxrwxrwx1 ftp  ftp 6 Nov 10  2003 .message - README
-rw-r--r--1 ftp  ftp   404 Nov 10  2003 README



This happens only when -c is used.




   Karsten


Re: Bug

2008-03-03 Thread Mark Pors
ok, thanks for your reply
We have a work-around in place now, but it doesnt scale very good.
Anyways, I'll start looking for another solution

Thanks!
Mark


On Sat, Mar 1, 2008 at 10:15 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1



  Mark Pors wrote:
   Hi,
  
   I posted this bug over two years ago:
   http://marc.info/?l=wgetm=113252747105716w=4
  From the release notes I see that this is still not resolved. Are
   there any plans to fix this any time soon?

  I'm not sure that's a bug. It's more of an architectural choice.

  Wget currently works by downloading a file, then, if it needs to look
  for links in that file, it will open it and scan through it. Obviously,
  it can't do that when you use -O -.

  There are plans to move Wget to a more stream-like process, where it
  scans links during download. At such time, it's very possible that -p
  will work the way you want it to. In the meantime, though, it doesn't.

  - --
  Micah J. Cowan
  Programmer, musician, typesetting enthusiast, gamer...
  http://micah.cowan.name/
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.6 (GNU/Linux)
  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

  iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9
  u646lF2Qp0abOw3iuvD0ohg=
  =Cix9
  -END PGP SIGNATURE-



Bug

2008-03-01 Thread Mark Pors
Hi,

I posted this bug over two years ago:
http://marc.info/?l=wgetm=113252747105716w=4
From the release notes I see that this is still not resolved. Are
there any plans to fix this any time soon?

Thanks
Mark


Re: bug on wget

2007-11-21 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 The new Wget flags empty Set-Cookie as a syntax error (but only
 displays it in -d mode; possibly a bug).

 I'm not clear on exactly what's possibly a bug: do you mean the fact
 that Wget only calls attention to it in -d mode?

That's what I meant.

 I probably agree with that behavior... most people probably aren't
 interested in being informed that a server breaks RFC 2616 mildly;

Generally, if Wget considers a header to be in error (and hence
ignores it), the user probably needs to know about that.  After all,
it could be the symptom of a Wget bug, or of an unimplemented
extension the server generates.  In both cases I as a user would want
to know.  Of course, Wget should continue to be lenient towards syntax
violations widely recognized by popular browsers.

Note that I'm not arguing that Wget should warn in this particular
case.  It is perfectly fine to not consider an empty `Set-Cookie' to
be a syntax error and to simply ignore it (and maybe only print a
warning in debug mode).


Re: bug on wget

2007-11-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
 Generally, if Wget considers a header to be in error (and hence
 ignores it), the user probably needs to know about that.  After all,
 it could be the symptom of a Wget bug, or of an unimplemented
 extension the server generates.  In both cases I as a user would want
 to know.  Of course, Wget should continue to be lenient towards syntax
 violations widely recognized by popular browsers.
 
 Note that I'm not arguing that Wget should warn in this particular
 case.  It is perfectly fine to not consider an empty `Set-Cookie' to
 be a syntax error and to simply ignore it (and maybe only print a
 warning in debug mode).

That was my thought. I agree with both of your points above: if Wget's
not handling something properly, I want to know about it; but at the
same time, silently ignoring (erroneous) empty headers doesn't seem like
a problem.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHRGqx7M8hyUobTrERCPwQAJ4wGFwPBqyoVDXjrOifNB/fVF1vtACbBnDU
fnSx/Vj+S+DVnfRUbIz5HKU=
=n4yr
-END PGP SIGNATURE-


bug on wget

2007-11-20 Thread Diego Campo
Hi,
I got a bug on wget when executing:

wget -a log -x -O search/search-1.html --verbose --wait 3
--limit-rate=20K --tries=3
http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1

Segmentation fault (core dumped)


I created directory search. 
The above creates a file search/search-1.html zero-sized.
Logfile log:

Resolviendo www.nepremicnine.net... 212.103.144.204
Conectando a www.nepremicnine.net|212.103.144.204|:80... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
--18:18:28--
http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1
   = `search/search-1.html'

(I hope you understand the Spanish above. If not, labels are the usual:
resolving, connecting, HTTP petition sent, waiting for request)

It happens the same when varying the parameter on the url id_regije,
just in case it helps.

I'm using Intel CoreDuo E6300, plenty of disk/mem space.
ubuntu 7.10

Should you need any further information don't hesitate to contact.
Regards
 Diego



Re: bug on wget

2007-11-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Diego Campo wrote:
 Hi,
 I got a bug on wget when executing:
 
 wget -a log -x -O search/search-1.html --verbose --wait 3
 --limit-rate=20K --tries=3
 http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1
 
 Segmentation fault (core dumped)

Hi Diego,

I was able to reproduce the problem above in the release version of
Wget; however, it appears to be working fine in the current development
version of Wget, which is expected to release soon as version 1.11.*

* Unfortunately, it has been expected to release soon for a few months
now; we got hung up with some legal/licensing issues that are yet to be
resolved. It will almost certainly be released in the next few weeks,
though.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQypR7M8hyUobTrERCF99AJ4w790h4juXzPwO+csBbSY3KcLOXACdGYgO
Kf4Oawgfjx6WOEzYwkQ47mw=
=8gL2
-END PGP SIGNATURE-


Re: bug on wget

2007-11-20 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 I was able to reproduce the problem above in the release version of
 Wget; however, it appears to be working fine in the current
 development version of Wget, which is expected to release soon as
 version 1.11.*

I think the old Wget crashed on empty Set-Cookie headers.  That got
fixed when I converted the Set-Cookie parser to use extract_param.
The new Wget flags empty Set-Cookie as a syntax error (but only
displays it in -d mode; possibly a bug).


Re: bug on wget

2007-11-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
 Micah Cowan [EMAIL PROTECTED] writes:
 
 I was able to reproduce the problem above in the release version of
 Wget; however, it appears to be working fine in the current
 development version of Wget, which is expected to release soon as
 version 1.11.*
 
 I think the old Wget crashed on empty Set-Cookie headers.  That got
 fixed when I converted the Set-Cookie parser to use extract_param.
 The new Wget flags empty Set-Cookie as a syntax error (but only
 displays it in -d mode; possibly a bug).

I'm not clear on exactly what's possibly a bug: do you mean the fact
that Wget only calls attention to it in -d mode?

I probably agree with that behavior... most people probably aren't
interested in being informed that a server breaks RFC 2616 mildly;
especially if it's not apt to affect the results. Unless of course the
user was expecting that the user send a real cookie, but I'm guessing
that this only happens when the server doesn't have one to send (or
something). But a user in that situation should be using -d (or at least
- -S) to find out what the server is sending.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQ3N97M8hyUobTrERCCpFAJ9RHcdJ8X4UWpEQIhz+khDWc8MOJwCfZANU
vr2lCTLP04R/PP/cBf7sIpE=
=6csr
-END PGP SIGNATURE-


bug in escaped filename calculation?

2007-10-04 Thread Brian Keck

Hello,

I'm wondering if I've found a bug in the excellent wget.
I'm not asking for help, because it turned out not to be the reason
one of my scripts was failing.

The possible bug is in the derivation of the filename from a URL which
contains UTF-8.

The case is:

  wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk

Of course these are all ascii characters, but underlying it are
3 nonascii characters, whose UTF-8 encoding is:

  hexoctal name
    ---  -
  C387  303 274  C-cedilla
  C3B6  303 266  o-umlaut
  C3BC  303 274  u-umlaut

The file created has a name that's almost, but not quite, a valid UTF-8
bytestring ... 

  ls *y*k | od -tc
  000 303   %   8   7   a   t   a   l   h 303 266   y 303 274   k  \n

Ie the o-umlaut  u-umlaut UTF-8 encodings occur in the bytestring,
but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the
3-byte string %87.

I'm guessing this is not intended.  

I would have sent a fix too, but after finding my way through http.c 
retr.c I got lost in url.c.

Brian Keck


Re: bug in escaped filename calculation?

2007-10-04 Thread Josh Williams
On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote:
 I would have sent a fix too, but after finding my way through http.c 
 retr.c I got lost in url.c.

You and me both. A lot of the code needs re-written.. there's a lot of
spaghetti code in there. I hope Micah chooses to do a complete
re-write for version 2 so I can get my hands dirty and understand the
code better.


Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
 On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote:
 I would have sent a fix too, but after finding my way through http.c 
 retr.c I got lost in url.c.
 
 You and me both. A lot of the code needs re-written.. there's a lot of
 spaghetti code in there. I hope Micah chooses to do a complete
 re-write for version 2 so I can get my hands dirty and understand the
 code better.

Currently, I'm planning on refactoring what exists, as needed, rather
than going for a complete rewrite. This will be driven by unit-tests, to
try to ensure that we do not lose functionality along the way. This
involves more work overall, but IMO has these key advantages:

 * as mentioned, it's easier to prevent functionality loss,
 * we will be able to use the work as its written, instead of waiting
many months for everything to be finished (especially with the current
number of developers), and
 * AIUI, the wording of employer copyright assignment releases may not
apply to new works that are not _preexisting_ as GPL works. This means
that, if a rewrite ended up using no code whatsoever from the original
work (not likely, but...), there could be legal issues.

After 1.11 is released (or possibly before), one of my top priorities is
to clean up the gethttp and http_loop functions to a degree where they
can be much more readily read and understood (and modified!). This is
important to me because so far (in my
probably-not-statistically-significant 3 months as maintainer) a
majority of the trickier fixes have been in those two functions. Some of
these fixes seem to frequently introduce bugs of their own, and I spend
more time than seems right in trying to understand the code there, which
is why these particular functions are prime targets for refactoring. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBR7E7M8hyUobTrERCCrbAJ9Jw7LB/YW4myDOyPiHvXLZ13rkNQCeOVbf
5INV0ApmUTuzxp8zO5haVCA=
=EeEd
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Brian Keck wrote:
 Hello,
 
 I'm wondering if I've found a bug in the excellent wget.
 I'm not asking for help, because it turned out not to be the reason
 one of my scripts was failing.
 
 The possible bug is in the derivation of the filename from a URL which
 contains UTF-8.
 
 The case is:
 
   wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk
 
 Of course these are all ascii characters, but underlying it are
 3 nonascii characters, whose UTF-8 encoding is:
 
   hexoctal name
     ---  -
   C387  303 274  C-cedilla
   C3B6  303 266  o-umlaut
   C3BC  303 274  u-umlaut
 
 The file created has a name that's almost, but not quite, a valid UTF-8
 bytestring ... 
 
   ls *y*k | od -tc
   000 303   %   8   7   a   t   a   l   h 303 266   y 303 274   k  \n
 
 Ie the o-umlaut  u-umlaut UTF-8 encodings occur in the bytestring,
 but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the
 3-byte string %87.

Using --restrict=nocontrol will do what you want it to, in this instance.

 I'm guessing this is not intended.  

Actually, it is (more-or-less).

Realize that Wget really has no idea how to tell whether you're trying
to give it UTF-8, or one of the ISO latin charsets. It tends to assume
the latter. It also, by default, will not create filenames with control
characters in them. In ISO latin, characters in the range 0x80-0x9f are
control characters, which is why Wget left %87 escaped, which falls into
that range, but not the others, which don't.

It is actually illegal to specify byte values outside the range of ASCII
characters in a URL, but it has long been historical practice to do so
anyway. In most cases, the intended meaning was one of the latin
character sets (usually latin1), so Wget was right to do as it does, at
that time.

There is now a standard for representing Unicode values in URLs, whose
result is then called IRLs (Internationalized Resource Locators).
Conforming correctly to this standard would require that Wget be
sensitive to the context and encoding of documents in which it finds
URLs; in the case of filenames and command arguments, it would probably
also require sensitivity to the current locale as determined by
environment variables. Wget is simply not equipped to handle IRLs or
encoding issues at the moment, so until it is, a proper fix will not be
in place. Addressing these are considered a Wget 2.0 (next-generation
Wget functionality) priority, and probably won't be done for a year or
two, given that the number of developers involved with Wget, if you add
up all the part-time helpers (including me), is probably still less than
one full-time dev. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBSHX7M8hyUobTrERCKRLAJwKiDOo0uO7x/k/iAEB/W0pPQmUJQCfUHaP
c6k2490strgy1Efy1DmiOhA=
=7lvZ
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 It is actually illegal to specify byte values outside the range of
 ASCII characters in a URL, but it has long been historical practice
 to do so anyway. In most cases, the intended meaning was one of the
 latin character sets (usually latin1), so Wget was right to do as it
 does, at that time.

Your explanation is spot-on.  I would only add that Wget's
interpretation of what is a control character is not so much geared
toward Latin 1 as it is geared toward maximum safety.  Originally I
planned to simply encode *all* file name characters outside the 32-127
range, but in practice it was very annoying (not to mention
US-centric) to encode perfectly valid Latin 1/2/3/... as %xx.  Since
the codes 128-159 *are* control characters (in those charsets) that
can mess up your screen and that you wouldn't want seen by default, I
decided to encode them by default, but allow for a way to turn it off,
in case someone used a different charset.

In the long run, supporting something like IRL is surely the right
thing to go for, but I have a feeling that we'll be stuck with the
current messy URLs for quite some time to come.  So Wget simply needs
to adapt to the current circumstances.  If the locale includes UTF-8
in any shape or form, it is perfectly safe to assume that it's valid
to create UTF-8 file names.  Of course, we don't know if a particular
URL path sequence is really meant to be UTF-8, but there should be no
harm in allowing valid UTF-8 sequences to pass through.  In other
words, the default quote control policy could simply be smarter
about what control means.

One consequence would be that Wget creates differently-named files in
different locales, but it's probably a reasonable price to pay for not
breaking an important expectation.  Another consequence would be
making users open to IDN homograph attacks, but I don't know if that's
a problem in the context of creating file names (IDN is normally
defined as a misrepresentation of who you communicate with).

For those who want to hack on this, the place to look at is
url.c:append_uri_pathel; that strangely-named function takes a path
element (a directory name or file name component of the URL) and
appends it to the file name.  It takes care not to ever use .. as a
path component and to respect the --restrict-file-names setting as
specified by the user.  It could be made to recognize UTF-8 character
sequences in UTF-8 locales and exempt valid UTF-8 chars from being
treated as control characters.  Invalid UTF-8 chars would still pass
all the checks, and non-canonical UTF-8 sequences would be rejected
(by condemning their byte values to being escaped as %..).  This is
not much work for someone who understands the basics of UTF-8.


[fwd] Wget Bug: recursive get from ftp with a port in the url fails

2007-09-17 Thread Hrvoje Niksic
---BeginMessage---
Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like Cantara.
The file system is NTFS.
Well I find my problem is, I wrote the command in schedule tasks like this:

wget  -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P
d:\virus.update\kaspersky

well, after wget,and before -N, I typed TWO spaces.

After delete one space, wget works well again.

Hope this can help.

:)

-- 
from:baalchina
---End Message---


Re: [fwd] Wget Bug: recursive get from ftp with a port in the url fails

2007-09-17 Thread Micah Cowan
Hrvoje Niksic wrote:
 Subject:
 Re: Wget Bug: recursive get from ftp with a port in the url fails
 From:
 baalchina [EMAIL PROTECTED]
 Date:
 Mon, 17 Sep 2007 19:56:20 +0800
 To:
 [EMAIL PROTECTED]
 
 To:
 [EMAIL PROTECTED]
 
 Message-ID:
 [EMAIL PROTECTED]
 MIME-Version:
 1.0
 Content-Type:
 multipart/alternative; boundary===-=-=
 
 
 Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like
 Cantara. The file system is NTFS.
 Well I find my problem is, I wrote the command in schedule tasks like this:
  
 wget  -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P
 d:\virus.update\kaspersky
  
 well, after wget,and before -N, I typed TWO spaces.
  
 After delete one space, wget works well again.
  
 Hope this can help.
  
 :)

Hi baalchina,

Hrvoje forwarded your message to the Wget discussion mailing list, where
such questions are really more appropriate, especially since Hrvoje is
not maintaining Wget any longer, but has left that responsibility for
others.

What you're describing does not appear to be a bug in Wget; it's the
shell's (or task scheduler's, or whatever) responsibility to split
space-separated elements properly; the words are supposed to already be
split apart (properly) by the time Wget sees it.

Also, you didn't really describe what was going wrong with Wget, or what
message about it's failure you were seeing (perhaps you'd need to
specify a log file with -o log, or via redirection of the command
interpreter supports it). However, if the problem is that Wget was
somehow seeing the space, as a separate argument or as part of another
one, then the bug lies with your task scheduler (or whatever is
interpreting the command line).

-- 
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/



signature.asc
Description: OpenPGP digital signature


ftp-ls.c - filesize parsing bug

2007-08-29 Thread Jason Mancini

Hello,
What the heck was this code supposed to do in ftp-ls.c?  If there is only a 
single
space between the previous token and the filesize, then t points at the 
NULL
character, and filesize is thought to be 0, resulting in a mismatch 
everytime.
ptok is already pointing at the start of the token, I don't understand the 
need to
try to decrement the pointer.  I commented out the two lines to fix the 
issue.

Thanks!  (ps Where is the ftp chdir bugfix?!  No wget releases...)
Jason


/* Back up to the beginning of the previous token
and parse it with str_to_wgint.  */
char *t = ptok;
while (t  line  ISDIGIT (*t)) // useless and buggy
 --t; // useless and buggy
if (t == line)

_
Learn.Laugh.Share. Reallivemoms is right place! 
http://www.reallivemoms.com?ocid=TXT_TAGHMloc=us




Re: bug and patch: blank spaces in filenames causes looping

2007-07-15 Thread Rich Cook


On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:




sprintf(filecopy, \%.2047s\, file);


This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string RETR ; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the  
sequence CR

LF may not occcur in the filename). Therefore, if you ask for a file
file.txt, a conforming server will attempt to find and deliver a  
file

whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.


I think you may well be correct.  I am now unable to reproduce the  
problem where the server does not recognize a filename unless I give  
it quotes.  In fact, as you say, the server ONLY recognizes filenames  
WITHOUT quotes and quoting breaks it.  I had to revert to the non- 
quoted code to get proper behavior.  I am very confused now.  I  
apologize profusely for wasting your time.  How embarrassing!


I'll save this email, and if I see the behavior again, I will provide  
you with the details you requested below.




Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug



--
Rich wealthychef Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





Re: bug and patch: blank spaces in filenames causes looping

2007-07-15 Thread Josh Williams

On 7/15/07, Rich Cook [EMAIL PROTECTED] wrote:

I think you may well be correct.  I am now unable to reproduce the
problem where the server does not recognize a filename unless I give
it quotes.  In fact, as you say, the server ONLY recognizes filenames
WITHOUT quotes and quoting breaks it.  I had to revert to the non-
quoted code to get proper behavior.  I am very confused now.  I
apologize profusely for wasting your time.  How embarrassing!

I'll save this email, and if I see the behavior again, I will provide
you with the details you requested below.


I wouldn't say it was a waste of time. Actually, I think it's good for
us to know that this problem exists on some servers. We're considering
writing a patch to recognise servers that do not support spaces. If
the standard method fails, then it will retry as an escaped character.

Nothing has been written for this yet, but it has been discussed, and
may be implemented in the future.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-15 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
 
 On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:
 

 sprintf(filecopy, \%.2047s\, file);

 This fix breaks the FTP protocol, making wget instantly stop working
 with many conforming servers, but apparently start working with yours;
 the RFCs are very clear that the file name argument starts right after
 the string RETR ; the very next character is part of the file name,
 including if the next character is a space (or a quote). The file name
 is terminated by the CR LF sequence (which implies that the sequence CR
 LF may not occcur in the filename). Therefore, if you ask for a file
 file.txt, a conforming server will attempt to find and deliver a file
 whose name begins and ends with double-quotes.

 Therefore, this seems like a server problem.
 
 I think you may well be correct.  I am now unable to reproduce the
 problem where the server does not recognize a filename unless I give it
 quotes.  In fact, as you say, the server ONLY recognizes filenames
 WITHOUT quotes and quoting breaks it.  I had to revert to the non-quoted
 code to get proper behavior.  I am very confused now.  I apologize
 profusely for wasting your time.  How embarrassing!

No worries, it happens! Sometimes the tests we run go other than we
think they did. :)
 
 I'll save this email, and if I see the behavior again, I will provide
 you with the details you requested below.

That would be terrific, thanks.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGmpOD7M8hyUobTrERCA7FAJ4oygvX7rpQy1k5FL7j3R12LUdWUACfVHrc
sk1tpS12pDYBvVbD4Nv7/I4=
=KCxk
-END PGP SIGNATURE-


Re: bug and patch: blank spaces in filenames causes looping

2007-07-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
 On OS X, if a filename on the FTP server contains spaces, and the remote
 copy of the file is newer than the local, then wget gets thrown into a
 loop of No such file or directory endlessly.   I have changed the
 following in ftp-simple.c, and this fixes the error.
 Sorry, I don't know how to use the proper patch formatting, but it
 should be clear.

I and another developer could not reproduce this problem, either in the
current trunk or in wget 1.10.2.

 sprintf(filecopy, \%.2047s\, file);

This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string RETR ; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the sequence CR
LF may not occcur in the filename). Therefore, if you ask for a file
file.txt, a conforming server will attempt to find and deliver a file
whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.

Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug

Thank you very much.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGl9KT7M8hyUobTrERCJfoAJ91z9c2GniuoaX0mj9oqzHrrpNCtQCePQnm
lvbVe0i5/jVy9V10uQpYgmk=
=iQq1
-END PGP SIGNATURE-


Re: [bug #20323] Wget issues HEAD before GET, even when the file doesn't exist locally.

2007-07-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Mauro Tortonesi wrote:
 Micah Cowan ha scritto:
 Update of bug #20323 (project wget):

   Status:  Ready For Test = In
 Progress   
 ___

 Follow-up Comment #3:

 Moving back to In Progress until some questions about the logic are
 answered:

 http://addictivecode.org/pipermail/wget-notify/2007-July/75.html
 http://addictivecode.org/pipermail/wget-notify/2007-July/77.html
 
 thanks micah.
 
 i have partly misunderstood the logic behind preliminary HEAD request.
 in my code, HEAD is skipped if -O or --no-content-disposition are given,
 but if -N is given HEAD is always sent. this is wrong, as HEAD should be
 skipped even if -N and --no-content-disposition are given (no need to
 care about the deprecated -N -O combination). can't think of any other
 case in which HEAD should be skipped, though.

Cc'ing wget ML, as it's probably important to open up discussion of the
current logic.

What about the case when nothing is given on the command line except
- --no-content-disposition? What do we need HEAD for then?

Also: I don't believe HEAD should be sent if no options are given on the
command line. What purpose would that serve? If it's to find a possible
Content-Disposition header, we can get that (and more reliably) at GET
time (though, I believe we may currently be requiring the file name
before we fetch, which if true, should definitely be changed but not for
1.11, in which case the HEAD will be allowed for the time being); and
since we're not matching against potential accept/reject lists, we don't
really need it.

I think it really makes much more sense to enumerate those few cases
where we need to issue a HEAD, rather than try to determine all the
cases where we don't: if I have to choose a side to err on, I'd rather
not send HEAD in a case or two where we needed it, rather than send it
in a few where we didn't, as any request-response cycle eats up time. I
also believe that the cases where we want a HEAD are/should be fewer
than the cases where we don't want them.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlol+7M8hyUobTrERCOT0AJwNt2dm/80zL7UYbadBaiaPrMvSUQCePKmS
WO77ltxl0vr0Pcgd8H1bIY8=
=zCTU
-END PGP SIGNATURE-


Re: [wget-notify] [bug #20466] --delete-after and --spider should not create (and leave) directories

2007-07-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Joshua David Williams wrote:
 URL:
   http://savannah.gnu.org/bugs/?20466

...

 Details:
 
 This patch forces the --no-directories option if we're not actually keeping
 the files we're downloading (as in the --delete-after and --spider options).
 This way, we don't leave a mess of empty directories.

This seems like a reasonable idea, but I'd like to get some discussion
on it first.

The downside, of course, is that there's no short option to reverse the
implied -nd; they'll have to use --directories (at the time I was
discussing it with Josh, I'd been thinking -e would be needed, but this
seems to be untrue).

It seems to me that by far the most common intention would be not to
leave any files around; this behavior seems fairly reason to me. Thoughts?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlpx17M8hyUobTrERCKP5AJ4rHtoA7xy9FNidKS7WooTwmF5xGACfYHv2
fIwxjHVH/t3H6/xkVk4Yqio=
=ZbKt
-END PGP SIGNATURE-


[Fwd: Bug#281201: wget prints it's progress even when background]

2007-07-11 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The following bug was submitted to Debian's bug tracker.
I'm curious what people think about this suggestion.

Don't we already check for something like redirected output (and force
the progress indicator to dots)? It seems to me that if that is
appropriate, then a case could be made for this as well.

Perhaps instead of shutting up, though, wget should attempt to direct
to a file? Perhaps with a one last message to the terminal (assuming
the terminal doesn't have TOSTOP set--it should ignore SIGTTOU and
handle EIO to handle that case), to indicate that it's doing this.

- -Micah


-  Original Message 
Subject: Bug#281201: wget prints it's progress even when background
Resent-Date: Tue, 10 Jul 2007 13:57:01 +,   Tue, 10 Jul 2007 13:57:02
+
Resent-From: Ilya Anfimov [EMAIL PROTECTED]
Resent-To: [EMAIL PROTECTED]
Resent-CC: Noèl Köthe [EMAIL PROTECTED]
Date: Tue, 10 Jul 2007 17:54:51 +0400
From: Ilya Anfimov [EMAIL PROTECTED]
Reply-To: Ilya Anfimov [EMAIL PROTECTED], [EMAIL PROTECTED]
To: Peter Eisentraut [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]


 My suggestion is to stop printing verbose progress messages
when the job is resumed in background. It could be checked
by (successful) getpgrp() not equal to (successful) tcgetprp(1)
in SIGCONT signal handler.
 And something like this is used in some console applications,
for example, in lftp.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlThP7M8hyUobTrERCA4sAJ0RwfVIsL5UcafLkfm5qihERnRNvQCeIABc
t+Y3FeNYctJsdPcPbTwYukk=
=eBSi
-END PGP SIGNATURE-


Re: wget bug?

2007-07-09 Thread Mauro Tortonesi
On Mon, 9 Jul 2007 15:06:52 +1200
[EMAIL PROTECTED] wrote:

 wget under win2000/win XP
 I get No such file or directory error messages when using the follwing 
 command line.
 
 wget -s --save-headers 
 http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc;
 
 %1 = 212BI
 Any ideas?

hi nikolaus,

in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. 
try using %1% instead of %1.

-- 
Mauro Tortonesi [EMAIL PROTECTED]


Re: wget bug?

2007-07-09 Thread Matthias Vill

Mauro Tortonesi schrieb:

On Mon, 9 Jul 2007 15:06:52 +1200
[EMAIL PROTECTED] wrote:


wget under win2000/win XP
I get No such file or directory error messages when using the follwing 
command line.


wget -s --save-headers 
http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc;

%1 = 212BI
Any ideas?


hi nikolaus,

in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. 
try using %1% instead of %1.



AFAIK it's ok to use %1, because it is a special case. Also the error 
would be a 404 or some wget error in that case the variable gets 
substituted in a wrong way or not? (actually even than you get a 200 
response with that url)


I just tried using the command inside a batch-file and came across 
another problem: You used a lowercase -s wich is not recognized by my 
wget-version, but a uppercase -S is. i guess you should change that.


I would guess wget is not in your PATH.
Try using c:\path\to\the dircetory\wget.exe instead of just wget.

If this too does not hel at explicit --restrict-file-names=windows to 
your options, so wget does not try to use the ? inside a filename. 
(normally not needed)


So a should-work-for-all-means-version is

c:\path\wget.exe -S --save-headers --restrict-file-names=windows 
http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc;


Of course just one line, but my dump mail-editor wrapped it.

Greetings
Matthias


Re: Bug update notifications

2007-07-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Matthew Woehlke wrote:
 Micah Cowan wrote:
 The wget-notify mailing list
 (http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
 receiving notifications of bug updates from GNU Savannah, in addition to
  subversion commits.
 
 ...any reason to not CC bug updates here also/instead? That's how e.g.
 kwrite does thing (also several other lists AFAIK), and seems to make
 sense. This is 'bug-wget' after all :-).

It is; but it's also 'wget'. While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the main
discussion/support list from the bugs list)?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkrpK7M8hyUobTrERCIMaAKCDG8JN7DmUK7oIuE0fYmgYnZIrlgCghK7n
iV8rIDYe1+cxzrQATM43CEM=
=PKqt
-END PGP SIGNATURE-


Re: Bug update notifications

2007-07-09 Thread Matthew Woehlke

Micah Cowan wrote:

Matthew Woehlke wrote:

Micah Cowan wrote:
...any reason to not CC bug updates here also/instead? That's how e.g.
kwrite does thing (also several other lists AFAIK), and seems to make
sense. This is 'bug-wget' after all :-).


It is; but it's also 'wget'.


Hmm, so it is; my bad :-).


While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the main
discussion/support list from the bugs list)?


I guess a common pattern is:
foo-help
foo-devel
foo-commits

...but of course you're the maintainer, it's your call :-).
(The above aren't necessarily actual names of course, just the 
categories it seems like I'm most used to seeing. e.g. the GNU 
convention is of course bug-foo, not foo-devel.)


--
Matthew
This .sig is false




wget bug?

2007-07-08 Thread Nikolaus_Hermanspahn
wget under win2000/win XP
I get No such file or directory error messages when using the follwing 
command line.

wget -s --save-headers 
http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1class=Arc;

%1 = 212BI
Any ideas?

thank you

Dr Nikolaus Hermanspahn
Advisor (Science)
National Radiation Laboratory
Ministry of Health
DDI: +64 3 366 5059
Fax: +64 3 366 1156

http://www.nrl.moh.govt.nz
mailto:[EMAIL PROTECTED]




Statement of confidentiality: This e-mail message and any accompanying
attachments may contain information that is IN-CONFIDENCE and subject to
legal privilege.
If you are not the intended recipient, do not read, use, disseminate,
distribute or copy this message or attachments.
If you have received this message in error, please notify the sender
immediately and delete this message.


*
This e-mail message has been scanned for Viruses and Content and cleared 
by the Ministry of Health's Content and Virus Filtering Gateway
*


Re: wget on gnu.org: Report a Bug

2007-07-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Tony Lewis wrote:
 The “Report a Bug” section of http://www.gnu.org/software/wget/ should
 encourage submitters to send as much relevant information as possible
 including wget version, operating system, and command line. The
 submitter should also either send or at least save a copy of the --debug
 output.

This information is currently in the bug submitting form at Savannah:
https://savannah.gnu.org/bugs/?func=additemgroup=wget

But should probably be duplicated at the website as well... let me know
if the current text could use improvement.

 Perhaps we need a --bug option for the command line that runs the
 command and saves important information in a file that can be submitted
 along with the bug report. The saved information would have to be
 sanitized to remove things like user IDs and passwords but could include
 things like the wget version, command line options, and what the command
 tried to do.

I think perhaps such things as the wget version and operating system
ought to be emitted by default anyway (except when -q is given).

Other than that, what kinds of things would --bug provide above and
beyond --debug?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGj+hk7M8hyUobTrERCHqtAJ9HTIFd3hOJ2R9aQBUqCtsvW2xJ1wCePOfo
67Olfti9HtI+1pYkNiCj7rc=
=/Rhd
-END PGP SIGNATURE-


RE: wget on gnu.org: Report a Bug

2007-07-07 Thread Tony Lewis
Micah Cowan wrote:

 This information is currently in the bug submitting form at Savannah:

That looks good.

 I think perhaps such things as the wget version and operating system
 ought to be emitted by default anyway (except when -q is given).

I'm not convinced that wget should ordinarily emit the operating system. It's 
really only useful to someone other than the person running the command.

 Other than that, what kinds of things would --bug provide above and
 beyond --debug?

It should echo the command line and the contents of .wgetrc to the bug output, 
which even the --debug option does not do. Perhaps we will think of other 
things to include in the output if this option gets added.

However, the big difference would be where the output was directed. When 
invoked as:
wget ... --bug bug_report

all interesting (but sanitized) information would be written to the file 
bug_report whether or not the command included --debug, which would also direct 
the debugging output to STDOUT.

The main reason I had for suggesting this option is that it would be easy to 
tell newbies with problems to run the exact same command with --bug 
bug_report and send the file bug_report to the list (or to whomever is working 
on the problem). The user wouldn't see the command behave any differently, but 
we'd have the information we need to investigate the report.

It might even be that most of us would choose to run with --bug most of the 
time relying on the normal wget output except when something appears to have 
gone wrong and then checking the file when it does.

Tony



Re: wget on gnu.org: Report a Bug

2007-07-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Micah Cowan wrote:
 Tony Lewis wrote:
 The “Report a Bug” section of http://www.gnu.org/software/wget/ should
 encourage submitters to send as much relevant information as possible
 including wget version, operating system, and command line. The
 submitter should also either send or at least save a copy of the --debug
 output.
 
 This information is currently in the bug submitting form at Savannah:
 https://savannah.gnu.org/bugs/?func=additemgroup=wget
 
 But should probably be duplicated at the website as well... let me know
 if the current text could use improvement.

I've copied the text to the website, along with a link to Simon Tatham's
essay on reporting bugs.

I also added a small section regarding the IRC #wget channel on FreeNode.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkDhh7M8hyUobTrERCDBQAJ4ln3eWsbdbsa5ahfB7kv5tHIc1wACeLSIj
uXkezPuzt7GMoiXvUemMT9U=
=2dVK
-END PGP SIGNATURE-


Bug update notifications

2007-07-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The wget-notify mailing list
(http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
receiving notifications of bug updates from GNU Savannah, in addition to
 subversion commits.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkG0Q7M8hyUobTrERCLVXAJwP7ru9v88PFF6PgREWTn0XF7XRnwCfY1hd
4W1KLuYYRvZ0pSXOLk6YY/Y=
=TOP4
-END PGP SIGNATURE-


Re: bug and patch: blank spaces in filenames causes looping

2007-07-06 Thread Steven M. Schweda
From various:

 [...]
char filecopy[2048];
if (file[0] != '') {
  sprintf(filecopy, \%.2047s\, file);
} else {
  strncpy(filecopy, file, 2047);
}
 [...]
 It should be:
 
  sprintf(filecopy, \%.2045s\, file);
 [...]

   I'll admit to being old and grumpy, but am I the only one who
shudders when one small code segment contains 2048, 2047, and 2045
as separate, independent literal constants, instead of using a macro, or
sizeof, or something which would let the next fellow change one buffer
size in one place, instead of hunting all over the code looking for
every 20xx which might be related?

   Just a thought.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: bug and patch: blank spaces in filenames causes looping

2007-07-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Steven M. Schweda wrote:
From various:
 
 [...]
char filecopy[2048];
if (file[0] != '') {
  sprintf(filecopy, \%.2047s\, file);
} else {
  strncpy(filecopy, file, 2047);
}
 [...]
 It should be:

  sprintf(filecopy, \%.2045s\, file);
 [...]
 
I'll admit to being old and grumpy, but am I the only one who
 shudders when one small code segment contains 2048, 2047, and 2045
 as separate, independent literal constants, instead of using a macro, or
 sizeof, or something which would let the next fellow change one buffer
 size in one place, instead of hunting all over the code looking for
 every 20xx which might be related?

Well, as already mentioned, aprintf() would be much more appropriate, as
it elminates the need for constants like these.

And yes, magic numbers drive me crazy, too. Of course, when used with
printf's 's' specifier, it needs special handling (crafting a STR()
macro or somesuch).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjxcX7M8hyUobTrERCHSAAJ9VkQdfhK4/LwByseYH2ZYVzoPqPwCePU1k
2Llybpq/oceXWMyZpBO4bPY=
=Vj/R
-END PGP SIGNATURE-


RE: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Tony Lewis
There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, \%.2047s\, file);

It should be:

 sprintf(filecopy, \%.2045s\, file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and patch: blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the  
remote copy of the file is newer than the local, then wget gets  
thrown into a loop of No such file or directory endlessly.   I have  
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it  
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request (RETR, file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '') {
 sprintf(filecopy, \%.2047s\, file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request (RETR, filecopy);






--
Rich wealthychef Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets  
better all the time.



Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
Good point, although it's only a POTENTIAL buffer overflow, and it's  
limited to 2 bytes, so at least it's not exploitable.  :-)



On Jul 5, 2007, at 9:05 AM, Tony Lewis wrote:


There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, \%.2047s\, file);

It should be:

 sprintf(filecopy, \%.2045s\, file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and patch: blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the
remote copy of the file is newer than the local, then wget gets
thrown into a loop of No such file or directory endlessly.   I have
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request (RETR, file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '') {
 sprintf(filecopy, \%.2047s\, file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request (RETR, filecopy);






--
Rich wealthychef Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets
better all the time.


--
Rich wealthychef Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





RE: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Virden, Larry W.
 


-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] 

Tony Lewis [EMAIL PROTECTED] writes:

 Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and 
 arbitrary limits on file name length.

If it uses the heap, then doesn't that open a hole where a particularly
long file name would overflow the heap?

-- 
URL: http://wiki.tcl.tk/ 
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
URL: mailto:[EMAIL PROTECTED]  URL: http://www.purl.org/NET/lvirden/

 


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 There is a buffer overflow in the following line of the proposed code:

  sprintf(filecopy, \%.2047s\, file);

Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Rich Cook [EMAIL PROTECTED] writes:

 Trouble is, it's undocumented as to how to free the resulting
 string.  Do I call free on it?

Yes.  Freshly allocated with malloc in the function documentation
was supposed to indicate how to free the string.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Virden, Larry W. [EMAIL PROTECTED] writes:

 Tony Lewis [EMAIL PROTECTED] writes:

 Wget has an `aprintf' utility function that allocates the result on
 the heap.  Avoids both buffer overruns and 
 arbitrary limits on file name length.

 If it uses the heap, then doesn't that open a hole where a particularly
 long file name would overflow the heap?

No, aprintf tries to allocate as much memory as necessary.  If the
memory is unavailable, malloc returns NULL and Wget exits.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
Trouble is, it's undocumented as to how to free the resulting  
string.  Do I call free on it?  I'd use asprintf, but I'm afraid to  
suggest that here as it may not be portable.


On Jul 5, 2007, at 10:45 AM, Hrvoje Niksic wrote:


Tony Lewis [EMAIL PROTECTED] writes:

There is a buffer overflow in the following line of the proposed  
code:


 sprintf(filecopy, \%.2047s\, file);


Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.


--
Rich wealthychef Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook


On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook [EMAIL PROTECTED] writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  Freshly allocated with malloc in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that  
didn't show up in my man pages, so I punted.  Sorry.


--
✐There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com
925-784-3077
--
✐



RE: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Bruso, John
Please remove me from this list. thanks,
 
John Bruso



From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Thu 7/5/2007 12:30 PM
To: Hrvoje Niksic
Cc: Tony Lewis; [EMAIL PROTECTED]
Subject: Re: bug and patch: blank spaces in filenames causes looping




On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:

 Rich Cook [EMAIL PROTECTED] writes:

 Trouble is, it's undocumented as to how to free the resulting
 string.  Do I call free on it?

 Yes.  Freshly allocated with malloc in the function documentation
 was supposed to indicate how to free the string.

Oh, I looked in the source and there was this xmalloc thing that 
didn't show up in my man pages, so I punted.  Sorry.

--
?There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com http://5pmharmony.com/ 
925-784-3077
--
?





Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Rich Cook [EMAIL PROTECTED] writes:

 On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:

 Rich Cook [EMAIL PROTECTED] writes:

 Trouble is, it's undocumented as to how to free the resulting
 string.  Do I call free on it?

 Yes.  Freshly allocated with malloc in the function documentation
 was supposed to indicate how to free the string.

 Oh, I looked in the source and there was this xmalloc thing that
 didn't show up in my man pages, so I punted.  Sorry.

No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
So forgive me for a newbie-never-even-lurked kind of question:  will  
this fix make it into wget for other users (and for me in the  
future)?  Or do I need to do more to make that happen, or...?  Thanks!


On Jul 5, 2007, at 12:52 PM, Hrvoje Niksic wrote:


Rich Cook [EMAIL PROTECTED] writes:


On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook [EMAIL PROTECTED] writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  Freshly allocated with malloc in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that
didn't show up in my man pages, so I punted.  Sorry.


No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).


--
✐There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com
925-784-3077
--
✐



Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
 So forgive me for a newbie-never-even-lurked kind of question:  will
 this fix make it into wget for other users (and for me in the future)? 
 Or do I need to do more to make that happen, or...?  Thanks!

Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some point,
but I wouldn't expect it to come out in the next release (which, itself,
will not be arriving for a couple months); it will probably go into wget
1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook

Thanks for the follow up.  :-)

On Jul 5, 2007, at 3:52 PM, Micah Cowan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:

So forgive me for a newbie-never-even-lurked kind of question:  will
this fix make it into wget for other users (and for me in the  
future)?

Or do I need to do more to make that happen, or...?  Thanks!


Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some  
point,
but I wouldn't expect it to come out in the next release (which,  
itself,
will not be arriving for a couple months); it will probably go into  
wget

1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-


--
✐There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com
925-784-3077
--
✐



Bug in the generated manpage

2007-06-12 Thread Stepan Kasal
Hello,

using Wget 1.10.2 I noticed that the man page description for
--no-proxy says:

For more information about the use of proxies with Wget,

... and that's all.  The original contains an @xref, which gets
swallowed by texi2pod.

I don't know how/if it should be repaired, but I thought it's worth
reporting.

Have a nice day,
Stepan


Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill
Mario Ander schrieb:
 Hi everybody,
 
 I think there is a bug storing cookies with wget.
 
 See this command line:
 
 C:\Programme\wget\wget --user-agent=Opera/8.5 (X11;
 U; en) --no-check-certificate --keep-session-cookies
 --save-cookies=cookie.txt --output-document=-
 --debug --output-file=debug.txt
 --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0
 https://www.vodafone.de/proxy42/portal/login.po;
[..]
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
 path=/jsp 
 Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
 expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
 path=/proxy42
[..]
 ---response end---
 200 OK
 Attempt to fake the path: /jsp,
 /proxy42/portal/login.po

So the problem seems to be that wget rejects cookies for paths which
don't fit to the request url. Like the script you call is in
/proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
those cookies, but wich is not related to /jsp

So it seems to be wget sticking to the strict RFC and the script doing
wrong.
To get this working you would need to patch wget for not RFC-compliant
cookies maybe along with an --accept-malformed-cookies directiv.

Hope this helps you

Matthias


Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill
Matthias Vill schrieb:
 Mario Ander schrieb:
 Hi everybody,

 I think there is a bug storing cookies with wget.

 See this command line:

 C:\Programme\wget\wget --user-agent=Opera/8.5 (X11;
 U; en) --no-check-certificate --keep-session-cookies
 --save-cookies=cookie.txt --output-document=-
 --debug --output-file=debug.txt
 --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0
 https://www.vodafone.de/proxy42/portal/login.po;
 [..]
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
 path=/jsp 
 Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
 expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
 path=/proxy42
 [..]
 ---response end---
 200 OK
 Attempt to fake the path: /jsp,
 /proxy42/portal/login.po
 
 So the problem seems to be that wget rejects cookies for paths which
 don't fit to the request url. Like the script you call is in
 /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
 those cookies, but wich is not related to /jsp
 
 So it seems to be wget sticking to the strict RFC and the script doing
 wrong.
 To get this working you would need to patch wget for not RFC-compliant
 cookies maybe along with an --accept-malformed-cookies directiv.
 
 Hope this helps you
 
 Matthias
 

So I thought of a second solution: If you have cygwin (or at least
bash+grep) you can run this small script to dublicate and truncate the
cookie.
--- CUT here ---
#!/bin/bash
#Author: Matthias Vill; feel free to change and use

#get the line for proxy42-path in $temp
temp=$(grep proxy42 cookies.txt)

#remove everything after last !
temp=${temp%!*}

#replace proxy42 by jsp
temp=${temp/proxy42/jsp}

#append newline to file
#echo cookies.txt

#add new cookie to cookies.txt
echo $tempcookies.txt
--- CUT here ---
Maybe you need to remove the # in front of echo cookies.txt to
compensate a missing trailing newline; otherwise you may end up changing
the value of the previous cookie.

Maybe this helps even more

Matthias


bug storing cookies with wget

2007-06-01 Thread Mario Ander
Hi everybody,

I think there is a bug storing cookies with wget.

See this command line:

C:\Programme\wget\wget --user-agent=Opera/8.5 (X11;
U; en) --no-check-certificate --keep-session-cookies
--save-cookies=cookie.txt --output-document=-
--debug --output-file=debug.txt
--post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0
https://www.vodafone.de/proxy42/portal/login.po;



wget answer this way:



DEBUG output created by Wget 1.10.2 on Windows.

--15:41:58-- 
https://www.vodafone.de/proxy42/portal/login.po
   = `-'
Resolving www.vodafone.de... seconds 0.00,
139.7.147.41
Caching www.vodafone.de = 139.7.147.41
Connecting to www.vodafone.de|139.7.147.41|:443...
seconds 0.00, connected.
Created socket 1844.
Releasing 0x003a5a90 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 1844 to SSL
handle 0x00931758
certificate:
  subject: /C=DE/ST=NRW/L=Duesseldorf/O=Vodafone D2
GmbH/OU=TOP-A/OU=Terms of use at www.verisign.com/rpa
(c)00/CN=www.vodafone.de
  issuer:  /O=VeriSign Trust Network/OU=VeriSign,
Inc./OU=VeriSign International Server CA - Class
3/OU=www.verisign.com/CPS Incorp.by Ref. LIABILITY
LTD.(c)97 VeriSign
WARNING: Certificate verification error for
www.vodafone.de: unable to get local issuer
certificate

---request begin---
POST /proxy42/portal/login.po HTTP/1.0 
User-Agent: Opera/8.5 (X11; U; en) 
Accept: */* 
Host: www.vodafone.de 
Connection: Keep-Alive 
Content-Type: application/x-www-form-urlencoded 
Content-Length: 77 
 
---request end---
[POST data:
name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0]
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 200 OK 
Date: Fri, 01 Jun 2007 13:41:56 GMT 
Server: Apache 
Pragma: No-cache 
Expires: Thu, 01 Jan 1970 00:00:00 GMT 
Set-Cookie:
JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
path=/jsp 
Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
Set-Cookie:
JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
path=/proxy42 
Cache-Control: no-cache,no-store,max-age=0 
P3P: CP=NOI ADM DEV PSAi COM NAV OUR OTR STP IND DEM

Connection: close 
Content-Type: text/html; charset=ISO-8859-1 
Via: 1.1 www.vodafone.de (Alteon iSD-SSL/6.0.5) 
 
---response end---
200 OK
Attempt to fake the path: /jsp,
/proxy42/portal/login.po
cdm: 1 2 3 4 5 6 7 8
Stored cookie vodafone.de -1 (ANY) / permanent
insecure [expiry 2007-06-01 17:05:16] VODAFONELOGIN
1

Stored cookie www.vodafone.de -1 (ANY) /proxy42
session insecure [expiry none] JSESSIONID
GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338
Length: unspecified [text/html]

0K .. .. .. ...   
338.67 KB/s

Closed 1844/SSL 0x931758
15:41:58 (338.67 KB/s) - `-' saved [34644]

Saving cookies to cookie.txt.
Done saving cookies.




The cookie.txt looks this way:



# HTTP cookie file.
# Generated by Wget on 2007-06-01 15:33:23.
# Edit at your own risk.

www.vodafone.de FALSE   /proxy42FALSE   0   JSESSIONID
GggBMfxV9vGqGwtyQGJFXsyCr6vQvGSh9KGgDt7xgLycdc5MTQps!1467361027!NONE!1180704801023
.vodafone.deTRUE/   FALSE   1180709801  VODAFONELOGIN   1



and should look like this (but does not):


# HTTP cookie file.
# Generated by Wget on 2007-06-01 15:47:31.
# Edit at your own risk.

www.vodafone.de FALSE   /proxy42FALSE   0   JSESSIONID
GgjRT1NTfspwH1cJCVPlGv37c4JKgkTDPYJNsTM2l1RJG0CJQ8Rp!-249032648!NONE!1180705649205
www.vodafone.de FALSE   /jspFALSE   0   JSESSIONID
GgjRT1NTfspwH1cJCVPlGv37c4JKgkTDPYJNsTM2l1RJG0CJQ8Rp!-249032648!NONE
.vodafone.deTRUE/   FALSE   1180710649  VODAFONELOGIN   1


That’s all.
Bye.




   

Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for 
today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow  


possible bug in wget-1.10.2 and earlier

2007-05-30 Thread Harrington, Paul
Hi,
wget appears to be confused by FTP servers that only have one space
between the file-size information. We only came across this problem
today so I don't know how common it is.
 
pjjH
 




From: Harrington, Paul 
Sent: Thursday, May 31, 2007 12:06 AM
To:  recipient-removed 
Subject: RE: File issue using WGET


Your FTP server must have changed the output of the listing format or,
more precisely, the string representation of some of the components has
changed such that only one space separates the group name from the
file-size. The bug is, of course, with wget but it is one that hitherto
had not been observed when interacting with your FTP server.
 
 
pjjH
 
 
 
[EMAIL PROTECTED] diff -u ftp-ls.c  ~/tmp
--- ftp-ls.c2005-08-04 17:52:33.0 -0400
+++ /u/harringp/tmp/ftp-ls.c2007-05-31 00:02:07.209955000 -0400
@@ -229,6 +229,18 @@
  break;
}
  errno = 0;
+  /* after the while loop terminates, t may not always
+ point to a space character. In the case when
+ there is only one-space between the user/group
+ information and the file-size, the space will
+ have been overwritten by a \0 via strok().  So,
+ if you have been through the loop at least once,
+ advance forward one chacter.
+  */
+
+  if (t  ptok)
+  t++;
+
  size = str_to_wgint (t, NULL, 10);
  if (size == WGINT_MAX  errno == ERANGE)
/* Out of range -- ignore the size.   Should



 



RE: wget bug

2007-05-24 Thread Tony Lewis
Highlord Ares wrote:

 

 it tries to download web pages named similar to

  http://site.com?variable=yesmode=awesome
http://site.com?variable=yesmode=awesome

 

Since  is a reserved character in many command shells, you need to quote
the URL on the command line:

 

wget  http://site.com?variable=yesmode=awesome
http://site.com?variable=yesmode=awesome;

 

Tony

 



wget bug

2007-05-23 Thread Highlord Ares

when I run wget on a certain sites, it tries to download web pages named
similar to http://site.com?variable=yesmode=awesome.  However, wget isn't
saving any of these files, no doubt because of some file naming issue?  this
problem exists in both the Windows  unix versions.

hope this helps


RE: wget bug

2007-05-23 Thread Willener, Pat
This does not look like a valid URL to me - shouldn't there be a slash at the 
end of the domain name?
 
Also, when talking about a bug (or anything else), it is always helpful if you 
specify the wget version (number).



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Highlord Ares
Sent: Thursday, May 24, 2007 11:41
To: [EMAIL PROTECTED]
Subject: wget bug


when I run wget on a certain sites, it tries to download web pages named 
similar to http://site.com?variable=yesmode=awesome.  However, wget isn't 
saving any of these files, no doubt because of some file naming issue?  this 
problem exists in both the Windows  unix versions. 

hope this helps



Bug using recursive get and stdout

2007-04-17 Thread Jonathan A. Zdziarski

Greetings,

Stumbled across a bug yesterday reproduced in both v1.8.2 and 1.10.2.

Apparently, recursive get tries to open the file for reading after  
downloading, to download subsequent files. Problem is, when used with  
-O - to deliver to stdout, it cannot open that file, so you get the  
output below (note the No such file or directory error). In 1.10,  
it appears that they removed this error message, but wget still fails  
to recursively fetch.


I realize it seems like there wouldn't be much reason to send more  
than one page to stdout, but I'm feeding it all into a statistical  
filter to classify website data, so it doesn't really matter to the  
filter. Do you know of any workaround for this, other than opening  
the files after reading (won't scale with thousands per minute).


Thanks!

$ wget -O - -r http://www.zdziarski.com  out
--15:40:06--  http://www.zdziarski.com/
   = `-'
Resolving www.zdziarski.com... done.
Connecting to www.zdziarski.com[209.51.159.242]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24,275 [text/html]

100%[] 24,275   163.49K/s 
ETA 00:00


15:40:06 (163.49 KB/s) - `-' saved [24275/24275]

www.zdziarski.com/index.html: No such file or directory

FINISHED --15:40:06--
Downloaded: 24,275 bytes in 1 files





Jonathan




Re: Bug using recursive get and stdout

2007-04-17 Thread Steven M. Schweda
   A quick search at http://www.mail-archive.com/wget@sunsite.dk/; for
-O found:

  http://www.mail-archive.com/wget@sunsite.dk/msg08746.html
  http://www.mail-archive.com/wget@sunsite.dk/msg08748.html

   The way -O is implemented, there are all kinds of things which are
incompatible with it, -r among them.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: FW: think you have a bug in CSS processing

2007-04-13 Thread J . F . Groff
Neil wrote:
 When giving it some thought I think a
 valid argument could be made that the string in the CSS document is not 
 exactly
 an URL but it is certainly URL-like.

The URL-like strings in CSS are actually standard URLs, either absolute or
relative, so they shouldn't be a big deal to handle. A caveat for the parser:
they can be quoted or unquoted and still work.
See http://www.w3.org/TR/CSS21/syndata.html#uri

Amazingly I found this feature request in a 2003 message to this very mailing
list. Are there only a few lunatics like me who think this should be included?

Cheers,

  JFG




RE: FW: think you have a bug in CSS processing

2007-04-13 Thread Tony Lewis
J.F.Groff wrote:

 Amazingly I found this feature request in a 2003 message to this very
mailing
 list. Are there only a few lunatics like me who think this should be
included?

Wget is written and maintained by volunteers. What you need to find is a
lunatic willing to volunteer to write the code to support this feature
request.

Tony



Re: FW: think you have a bug in CSS processing

2007-04-13 Thread J.F. Groff

Hi Tony,


 Amazingly I found this feature request in a 2003 message to this very
mailing
 list. Are there only a few lunatics like me who think this should be
included?

Wget is written and maintained by volunteers. What you need to find is a
lunatic willing to volunteer to write the code to support this feature
request.


Heh, sure ! I'm lunatic enough to try... Fetching the code from svn as
I write this. But the docs page says:

At the moment the GNU Wget development tree has been split in two
branches in order to allow bugfixing releases of the feature-frozen
1.10.x tree while continuing the development for Wget 2.0 on the main
branch.

Anywhere I can look at planned features for the 2.0 branch? There's an
awful lot of items in the project's TODO list but no mention of CSS.
Shall I just add the feature request to the TODO first, or is there a
community process involved in picking candidate features?

Cheers,

 JFG


Re: FW: think you have a bug in CSS processing

2007-04-13 Thread J.F. Groff

Oh wait. Somebody already did the patch!

http://www.mail-archive.com/[EMAIL PROTECTED]/msg09502.html
http://article.gmane.org/gmane.comp.web.wget.patches/1867

I guess it's up to maintainers to decide whether to include this in
the standard wget distribution. In the meantime, hearty thanks to Ted
Mielczarek, you made my day!

 JFG

On 4/13/07, J.F. Groff [EMAIL PROTECTED] wrote:

Hi Tony,

  Amazingly I found this feature request in a 2003 message to this very
 mailing
  list. Are there only a few lunatics like me who think this should be
 included?

 Wget is written and maintained by volunteers. What you need to find is a
 lunatic willing to volunteer to write the code to support this feature
 request.

Heh, sure ! I'm lunatic enough to try... Fetching the code from svn as
I write this. But the docs page says:

At the moment the GNU Wget development tree has been split in two
branches in order to allow bugfixing releases of the feature-frozen
1.10.x tree while continuing the development for Wget 2.0 on the main
branch.

Anywhere I can look at planned features for the 2.0 branch? There's an
awful lot of items in the project's TODO list but no mention of CSS.
Shall I just add the feature request to the TODO first, or is there a
community process involved in picking candidate features?

Cheers,

  JFG



Bug-report: wget with multiple cnames in ssl certificate

2007-04-12 Thread Alex Antener
Hi

If i connect with wget 1.10.2 (Debian Etch  Ubuntu Feisty Fawn) to a
secure host, that uses multiple cnames in the certificate i get the
following error:

[EMAIL PROTECTED]:~$ wget https://host.domain.tld
--10:18:55--  https://host.domain.tld/
   = `index.html'
Resolving host.domain.tld... xxx.xxx.xxx.xxx
Connecting to host.domain.tld|xxx.xxx.xxx.xxx|:443... connected.
ERROR: certificate common name `host0.domain.tld' doesn't match
requested host name `host.domain.tld'.
To connect to host.domain.tld insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

If I do the same with wget 1.9.1 (Debian Sarge) I do not get that Error.

Kind regards, Alex Antener

-- 
Alex Antener
Dipl. Medienkuenstler FH

[EMAIL PROTECTED] // http://lix.cc // +41 (0)44 586 97 63
GPG Key: 1024D/14D3C7A1 https://lix.cc/gpg_key.php
Fingerprint: BAB6 E61B 17D7 A9C9 6313  5141 3A3C DAA3 14D3 C7A1



think you have a bug in CSS processing

2007-03-30 Thread Neil Smithline

I think I found a bug in CSS processing. This was auto-generated and I'm far
from a CSS expert (quite the opposite). But, as far as I can tell (see
snippet below), it is supposed to be loaded from a directory named - that
is off of the main URL. For example, if the origination site is
http://www.foo.com, the GIF will be at
http://www.foo.com/-/includes/styles/swirl/skin_swirl_grey_top.gif. The
below text is came from the converted HTML file on the destination site.
You'll notice that the URL was not converted to an absolute URL pointing to
www.foo.com but neither was the GIF copied to the destination site. I've
done a find and it is nowhere to be found.

This really isn't a big deal for me as it is only one file and I've just
manually copied it over, but it does seem to be a bug worthy of fixing. If
you need more data, you can look at www.smithline.net. The snippet comes
from that page which was created using google page creator (don't ask me why
- it is definitely far from being ready for prime time) and then wget'ed
over to smithline.net.

Feel free to ping me should you need more info - Neil

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/home/neils/bin wget
--mirror --force-html --convert-links --no-parent
--directory-prefix=/home/neils/smithline.net/data --quiet --recursive
--no-host-directories http://www.smithline.net-a.googlepages.com


   #container {
  padding: 0px;
  background:URL(/-/includes/style/swirl/skin_swirl_grey_top.gif)
no-repeat top left;
  background-color:#dfdfdf;
  margin:0px auto;
   }


Re: wget-1.10.2 pwd/cd bug

2007-03-27 Thread Hrvoje Niksic
Hrvoje Niksic [EMAIL PROTECTED] writes:

 [EMAIL PROTECTED] (Steven M. Schweda) writes:

It's starting to look like a consensus.  A Google search for:
 wget DONE_CWD
 finds:

   http://www.mail-archive.com/wget@sunsite.dk/msg08741.html

 That bug is fixed in subversion, revision 2194.

I forgot to add that this means that the patch can be retrieved with
`svn diff -r2193:2194' in Wget's source tree.  If you don't have a
checkout handy, Subversion still allows you to generate a diff using
`svn diff -r2193:2194 http://svn.dotsrc.org/repo/wget/trunk/'.

Also note that the fix is also available on the stable branch, and I
urge the distributors to apply it to their versions until 1.10.3 or
1.11 is released.


Re: wget-1.10.2 pwd/cd bug

2007-03-25 Thread Hrvoje Niksic
[EMAIL PROTECTED] (Steven M. Schweda) writes:

It's starting to look like a consensus.  A Google search for:
 wget DONE_CWD
 finds:

   http://www.mail-archive.com/wget@sunsite.dk/msg08741.html

That bug is fixed in subversion, revision 2194.


wget-1.10.2-5mdv2007.1 pwd/cd bug

2007-03-24 Thread Jason Mancini

Hello,
If wget cannot connect to the FTP server the first time,
it fails to CD properly after checking the path with PWD.
Here is a -d listing when connecting after failing.  Thanks!
Jason

   $cmd = wget -d --limit-rate=999k --tries=0 --no-remove-listing -N 
$ftp/*.rpm;



--11:06:12--  
ftp://ftp:[EMAIL PROTECTED]/pub/linux/distributions/mandrivalinux/devel/cooker/i586/media/main/release/*.rpm

 (try: 2) = `.listing'
Found carroll.aset.psu.edu in host_name_addresses_map (0x808bf98)
Connecting to carroll.aset.psu.edu|128.118.2.96|:21... connected.
Created socket 3.
Releasing 0x0808bf98 (new refcount 1).
Logging in as ftp ...

220- snip big login message

-- USER ftp

331 Please specify the password.

-- PASS [EMAIL PROTECTED]

230 Login successful.
Logged in!
== SYST ...
-- SYST

215 UNIX Type: L8
done.== PWD ...
-- PWD

257 /
done.
== TYPE I ...
-- TYPE I

200 Switching to Binary mode.
done.  == CWD not required.
conaddr is: 128.118.2.96
== PASV ...
-- PASV

227 Entering Passive Mode (128,118,2,96,184,134)
trying to connect to 128.118.2.96 port 47238
Created socket 4.
done.== LIST ...
-- LIST

150 Here comes the directory listing.
done.

   [ =   ] 331   --.--K/s

Closed fd 4
226 Directory send OK.
11:11:23 (412.30 KB/s) - `.listing' saved [331]

DIRECTORY; perms 700; month: Sep; day: 8; year: 2005 (no tm);
DIRECTORY; perms 700; month: Sep; day: 23; year: 2005 (no tm);
DIRECTORY; perms 755; month: May; day: 24; year: 2006 (no tm);
PLAINFILE; perms 644; month: Sep; day: 9; year: 2005 (no tm);
PLAINFILE; perms 644; month: Sep; day: 9; year: 2005 (no tm);
No matches on pattern `*.rpm'.
Closed fd 3

_
Get a FREE Web site, company branded e-mail and more from Microsoft Office 
Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/




wget-1.10.2 pwd/cd bug

2007-03-24 Thread Jason Mancini

I downloaded 1.10.2 source code.
u-cmd goes from 0x1B to 0x19, dropping DO_CMD on the second call
to ftp.c:getftp() after connection failure.  I'm trying to debug THE loop.
Jason

_
Watch free concerts with Pink, Rod Stewart, Oasis and more. Visit MSN 
Presents today. 
http://music.msn.com/presents?icid=ncmsnpresentstaglineocid=T002MSN03A07001




wget-1.10.2 pwd/cd bug

2007-03-24 Thread Jason Mancini

This is inverted in ftp.c:

 if (con-csock != -1)
con-st = ~DONE_CWD;
 else
con-st |= DONE_CWD;

If not error, request cwd?
If error, cwd done?

It's backwards.  Changing != to == solves the bug.
Thanks!
Jason

_
5.5%* 30 year fixed mortgage rate. Good credit refinance. Up to 5 free 
quotes - *Terms 
https://www2.nextag.com/goto.jsp?product=10035url=%2fst.jsptm=ysearch=mortgage_text_links_88_h2a5ds=4056p=5117disc=yvers=910




wget-1.10.2 pwd/cd bug

2007-03-24 Thread Steven M. Schweda
   It's starting to look like a consensus.  A Google search for:
wget DONE_CWD
finds:

  http://www.mail-archive.com/wget@sunsite.dk/msg08741.html



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: file numbering bug

2007-03-08 Thread Steven M. Schweda
From: Robert Dick

 When serializing sucessive copies of a page, the serial number appears
 at the end of the extension, i.e, what should be file1.html is called
 file.html.1 I'm using wget ver. 1.10.2. with the default options on
 Windows ME ...

   I can see how that might annoy a Windows user, but it would probably
be a terrible idea to change the file name as you suggest, because it
would break any HTML links to file.html which might appear in any
other file.

   If you don't like the .nnn suffix, then you'll need to clean it up
later, or else don't download the same file twice into the same
directory.  (Or you could use VMS, where file version numbers are a
natural part of the file system, so the .nnn suffix is not needed, and
this problem does not arise.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


RE: file numbering bug

2007-03-08 Thread Sandhu, Ranjit
It wouldn't break on windoze because file.html still exists.  He just
wants a different naming schema for the newer copies.  There would be no
links to file.html.1 or file 1.html for that matter, so it really
doesn't matter which way you rename it.  

Although if there is a file called file 1.html and you downloaded it
again, using your NEW schema, it would become file 11.html, which
would be somewhat confusing :) 

Ranjit Sandhu
703.803.1755
SRA

-Original Message-
From: Steven M. Schweda [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 08, 2007 11:50 AM
To: WGET@sunsite.dk
Cc: [EMAIL PROTECTED]
Subject: Re: file numbering bug

From: Robert Dick

 When serializing sucessive copies of a page, the serial number appears

 at the end of the extension, i.e, what should be file1.html is called
 file.html.1 I'm using wget ver. 1.10.2. with the default options on 
 Windows ME ...

   I can see how that might annoy a Windows user, but it would probably
be a terrible idea to change the file name as you suggest, because it
would break any HTML links to file.html which might appear in any
other file.

   If you don't like the .nnn suffix, then you'll need to clean it up
later, or else don't download the same file twice into the same
directory.  (Or you could use VMS, where file version numbers are a
natural part of the file system, so the .nnn suffix is not needed, and
this problem does not arise.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


ntlm already authenticated bug and fix.

2007-02-12 Thread Phill Bertolus
Hi Mauro (I'm guessing here - got this from the web page)

Here is a patch against 1.10.2 which fixes an issue I found when using
NTLM with Microsoft's Intermittent Information Server (IIS).

The issue is not with wget, but rather a bug in IIS. Nevertheless, here
is the fix and a description of the problem.

Essentially IIS has the ability to create domains for want of a better
description (I'm not an IIS expert by any means) within a single
instance of the IIS server.

Each of these domains (I understand) is more or less independent. The
bug manifests itself when a page within one domain links to a page
within another domain on the same IIS instance.

The web address of the server remains the same except the URI points to
some other directory under the server's root.

In this case, when the connection is first setup by wget, NTLM
authenticates correctly. Subsequent recursive gets also work fine
*until* a reference is made to another domain.

When the cross domain reference occurs IIS issues another NTLM
challenge, as if the connection is not authenticated. Now, as you and I
know, NTLM is a connection authentication protocol, meaning you cannot
be connected unless you are authenticated. So IIS's other domains
already know the connection is authenticated because it *is* a
connection, nevertheless, they insist on re-authentication.

This patch addresses the issue by forcing a disconnect and retry when
this circumstance is detected (Actually, this always disconnects in this
rev. The detection bit needs more work).

That is to say, if an NTLM challenge occurs when the connection is
already active *and* NTLM authenticated, the connection is terminated
and restarted (thus invoking the challenge-response code) and ultimately
re-authenticating.

This work is the result of many hours of work and extensive network
debugging with the help of an Australian law enforcement agency.

--- wget-1.10.2.orig/src/http.c 2005-08-09 08:54:16.0 +1000
+++ wget-1.10.2/src/http.c  2006-11-21 12:25:22.0 +1100
@@ -1960,10 +1960,12 @@
  hs-restval, hs-rd_size, hs-len, hs-dltime,
  flags);

+/*
   if (hs-res = 0)
 CLOSE_FINISH (sock);
   else
-CLOSE_INVALIDATE (sock);
+*/
+  CLOSE_INVALIDATE (sock);

   {
 /* Close or flush the file.  We have to be careful to check for


Cheers
Phill.

P.S. the work was done last year and I'm finally cleaning up the loose ends. 
Hope this helps.

Phill Bertolus
Technical Director
Web Wombat Pty. Ltd.

Ph: +61-3-9675-0900 (Switch)
Ph: +61-3-9675-0901 (Direct)
Mb: +61-4-1632-6853
Fx: +61-3-9675-0999





Re: wget-1.10.2 cookie expiry bug

2007-01-23 Thread Hrvoje Niksic
Thanks for the report and the (correct) analysis.  This patch fixes
the problem in the trunk.


2007-01-23  Hrvoje Niksic  [EMAIL PROTECTED]

* cookies.c (parse_set_cookie): Would erroneously discard cookies
with unparsable expiry time.

Index: src/cookies.c
===
--- src/cookies.c   (revision 2202)
+++ src/cookies.c   (working copy)
@@ -390,17 +390,16 @@
{
  cookie-permanent = 1;
  cookie-expiry_time = expires;
+ /* According to netscape's specification, expiry time in
+the past means that discarding of a matching cookie
+is requested.  */
+ if (cookie-expiry_time  cookies_now)
+   cookie-discard_requested = 1;
}
  else
/* Error in expiration spec.  Assume default (cookie doesn't
   expire, but valid only for this session.)  */
;
-
- /* According to netscape's specification, expiry time in the
-past means that discarding of a matching cookie is
-requested.  */
- if (cookie-expiry_time  cookies_now)
-   cookie-discard_requested = 1;
}
   else if (TOKEN_IS (name, max-age))
{


wget-1.10.2 cookie expiry bug

2007-01-22 Thread Jay Soffian

(Resend as I've received no reply to the original message.)

Kind wget maintainers,

I believe I found a bug in the wget cookie expiry handling. Recently  
I was using wget receiving back a cookie with an expiration of Sun,  
20-Sep-2043 19:37:28 GMT.


This fits inside a 32-bit unsigned long but unfortunately overflows a  
32-bit signed long by about 4 years.


It would appear that timegm (called from http_atotm) returns -1 when  
it overflows. At least that was the behavior I observed with my  
system's timegm (OS X 10.4.8/i386) and the timegm that ships with  
wget (I recompiled using the wget timegm function to test).


Looking at cookies.c, the intent seems to be to treat a (time_t) -1  
as a session cookie. If this is the case, there is a bug in the logic  
which instead causes wget to discard the cookie entirely:


  expires = http_atotm (value_copy);
  if (expires != (time_t) -1)
{
  cookie-permanent = 1;
  cookie-expiry_time = expires;
}
  else
/* Error in expiration spec.  Assume default (cookie doesn't
   expire, but valid only for this session.)  */
;

  /* According to netscape's specification, expiry time in the
 past means that discarding of a matching cookie is
 requested.  */
  if (cookie-expiry_time  cookies_now)
cookie-discard_requested = 1;

The problem is that when http_atotm returns -1, cookie-expiry_time  
does not get set, defaulting to 0 (I think). That then causes the  
cookie to be discarded. I've attached the world's smallest patch  
which corrects this behavior to what I believe the comments intended.


Thanks,

j.

wget-1.10.2.cookie_expiry.patch
Description: Binary data


Possibly bug

2007-01-17 Thread Yuriy Padlyak

Hi,

Have been downloading slackware-11.0-install-dvd.iso, but It seems wget 
downloaded more then filesize and I found:


-445900K .. .. .. .. ..119%   
18.53 KB/s

in  wget-log.

Regards,
Yuriy Padlyak


Re: Possibly bug

2007-01-17 Thread M.
The file was probably being uploaded when you started downloading it, so
the HTTP server continued sending data even over the initially reported
filesize.

Just stop wget, and start it again with option -c to resume download.


MT

Le mercredi 17 janvier 2007 à 18:16 +0200, Yuriy Padlyak a écrit :
 Hi,
 
 Have been downloading slackware-11.0-install-dvd.iso, but It seems wget 
 downloaded more then filesize and I found:
 
 -445900K .. .. .. .. ..119%   
 18.53 KB/s
 in  wget-log.
 
 Regards,
 Yuriy Padlyak



Re: Possibly bug

2007-01-17 Thread Steven M. Schweda
From: Yuriy Padlyak

 Have been downloading slackware-11.0-install-dvd.iso, but It seems wget
 downloaded more then filesize and I found: 
 
 -445900K .. .. .. .. ..119%
 18.53 KB/s 
 
 in  wget-log.

   As usual, it would help if you provided some basic information. 
Which wget version (wget -V)?  On which system type?  OS and version? 
Guesswork follows.

   Wget versions before 1.10 did not support large files, and a DVD
image could easily exceed 2GB.  Negative file sizes are a common symptom
when using a small-file program with large files.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Bug in 1.10.2 vs 1.9.1

2007-01-03 Thread Mauro Tortonesi

Juhana Sadeharju wrote:

Hello. Wget 1.10.2 has the following bug compared to version 1.9.1.
First, the bin/wgetdir is defined as
  wget -p -E -k --proxy=off -e robots=off --passive-ftp
  -o zlogwget`date +%Y%m%d%H%M%S` -r -l 0 -np -U Mozilla --tries=50
  --waitretry=10 $@

The download command is
  wgetdir http://udn.epicgames.com

Version 1.9.1 result: download ok
Version 1.10.2 result: only udn.epicgames.com/Main/WebHome downloaded
and other converted urls are of the form
  http://udn.epicgames.com/../Two/WebHome


hi juhana,

could you please try the current version of wget from our subversion 
repository:


http://www.gnu.org/software/wget/wgetdev.html#development

?

this bug should be fixed in the new code.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


  1   2   3   4   5   6   7   >