subject:"bug\?"

Re: --mirror and --cut-dirs=2 bug?

2008-11-03 Thread Brock Murch

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah,

Many thanks with all your very timely help. I have had no issues since 
following you instructions to upgrade to 1.11.4 and installing it in the /opt 
directory. I used:

$ ./configure --prefix=/opt/wget

And point to ist specifically:

/opt/wget/bin/wget  --tries=10 -r -N -l inf --wait=1\
-nH --cut-dirs=2 ftp://oceans.gsfc.nasa.gov/MODISA/ATTEPH/ \
-o /home1/software/modis/atteph/mirror_a.log \
--directory-prefix=/home1/software/modis/atteph

Thanks again.

Brock


On Monday 27 October 2008 3:06 pm, Micah Cowan wrote:
> Brock Murch wrote:
> > Sorry, 1 quick question? Do you know of anyone providing rpm's of 1.11.4
> > for CentOS?
>
> Not offhand. It may not yet be available; it was only packaged for
> Fedora Core a couple months ago, I think. RPMfind.net just lists 1.11.4
> sources for fc9 and fc10.
>
> > If not, would you recommend uninstalling the current one? Before
> > installing from your src? Many thanks.
>
> I'd advise against that: I believe various important components of Red
> Hat/CentOS rely on wget to fetch things. Sometimes minor changes in the
> output/interface of wget cause problems for automated scripts that form
> an integral part of an operating system. Though really, I think most of
> the changes that would pose such a danger are actually already in the
> "Red Hat modified" 1.10.2 sources (taken from the development sources
> for what was later released as 1.11).
>
> What I tend to do on my systems, is to configure the sources like:
>
>   $ ./configure --prefix=$HOME/opt/wget
>
> and then either add $HOME/opt to my $PATH, or invoke it directly as
> $HOME/opt/wget/bin/wget.
>
> Note that if you want to build wget with support for HTTPS, you'll need
> to have the development package for openssl installed.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFJDwveMAkzD2qY/pURAmvuAJ9XG784Djq0mwcTu/nN56tPSM+AMQCgm2KX
dzPQ263FF7Gaw4qtE1X0wTI=
=CC9T
-END PGP SIGNATURE-

Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Brock Murch

Micah,

Thanks for your quick attention to this. Yous, I probably forgot to include 
the version #

[EMAIL PROTECTED] atteph]# wget --version
GNU Wget 1.10.2 (Red Hat modified)

Copyright (C) 2005 Free Software Foundation, Inc.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Originally written by Hrvoje Niksic <[EMAIL PROTECTED]>.

I will see if I can get the newest version for:
[EMAIL PROTECTED] atteph]# cat /etc/redhat-release
CentOS release 4.2 (Final)

I'll let you know how that goes.

Brock

On Monday 27 October 2008 2:19 pm, Micah Cowan wrote:
> Micah Cowan wrote:
> > I believe we made some related fixes more recently. You provided a great
> > amount of useful information, but one thing that seems to be missing (or
> > I missed it) is the Wget version number. Judging from the log, I'd say
> > it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
> > you please try to verify whether Wget continues to exhibit this problem
> > in the latest release version?
>
> This problem looks like the one that Mike Grant fixed in October of
> 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it
> should definitely be fixed in 1.11.4. Please let me know if it isn't.

Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah Cowan wrote:
> I believe we made some related fixes more recently. You provided a great
> amount of useful information, but one thing that seems to be missing (or
> I missed it) is the Wget version number. Judging from the log, I'd say
> it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
> you please try to verify whether Wget continues to exhibit this problem
> in the latest release version?

This problem looks like the one that Mike Grant fixed in October of
2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it
should definitely be fixed in 1.11.4. Please let me know if it isn't.

- --
Regards,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBgY+7M8hyUobTrERArrRAJ4p4Y7jwWfic0Wul7UBnBXlSzD2XQCePifc
kWs00JOULkzJmzozK7lmcfA=
=iSL3
-END PGP SIGNATURE-

Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brock Murch wrote:
> I try to keep a mirror of NASA atteph ancilliary data for modis processing. I 
> know that means little, but I have a cron script that runs 2 times a day. 
> Sometimes it works, and others, not so much. The sh script is listed at the 
> end of this email below. As is the contents of the remote ftp server's root 
> and portions fo the log. 
> 
> I don't need all the data on the remote server, only some thus I use 
> --cut-dirs.To make matters stranger, the software (also from NASA) that uses 
> these files, looks for them in a single place on the client machine where the 
> software runs, but needs data from 2 different directories on the remote ftp 
> server. If the data is not on the client machine, the software kindly ftp's 
> the files to the local directory. However, I don't allow write access to that 
> directory as many people use the software and when it is d/l'ed it has the 
> wrong perms for others to use it, thus I mirror the data I need from the ftp 
> site locally. In the script below, there are 2 wget commands, but they are to 
> slightly different directories (MODISA & MODIST).

I wouldn't recommend that. Using the same output directory for two
different source directories seems likely to lead to problems. You'd
most likely be better off by pulling to two locations, and then
combining them afterwards.

I don't know for sure that it _will_ cause problems (except if they
happen to have same-named files), as long as .listing files are being
properly removed (there were some recently-fixed bugs related to that, I
think? ...just appending new listings on top of existing files).

> It appears to me that the problem occurs if there is a ftp server error, and 
> wget starts a retry. wget goes to the server root, gets the .listing from 
> there for some reason (as opposed to the directory it should go to on the 
> server), and then goes to the dir it needs to mirror and can't find the files 
> (that are listed in the root dir) and creates dirs, and then I get "No such 
> file" errors and recursive directories created. Any advice would be 
> appreciated.

This snippet seems to be the source of the problem:

> Error in server response, closing control connection.
> Retrying.
> 
> - --14:53:53--  ftp://oceans.gsfc.nasa.gov/MODIST/ATTEPH/2002/110/
>   (try: 2) => `/home1/software/modis/atteph/2002/110/.listing'
> Connecting to oceans.gsfc.nasa.gov|169.154.128.45|:21... connected.
> Logging in as anonymous ... Logged in!
> ==> SYST ... done.==> PWD ... done.
> ==> TYPE I ... done.  ==> CWD not required.
> ==> PASV ... done.==> LIST ... done.

That "CWD not required" bit is erroneous. I'm 90% sure we fixed this
issue recently (though I'm not 100% sure that it went to release: I
believe so).

I believe we made some related fixes more recently. You provided a great
amount of useful information, but one thing that seems to be missing (or
I missed it) is the Wget version number. Judging from the log, I'd say
it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
you please try to verify whether Wget continues to exhibit this problem
in the latest release version?

I'll also try to look into this as I have time (but it might be awhile
before I can give it some serious attention; it'd be very helpful if you
could do a little more legwork).

- --
Thanks very much,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBgNh7M8hyUobTrERAuGoAKCCUoBN0sURKA/51x0o4HN59K8+AACfUYuj
i8XW58MvjvbS3oy4OsOmbpc=
=4kpD
-END PGP SIGNATURE-

[bug] wrong speed calculation in (--output-file) logfile

2008-10-25 Thread Peter Volkov

Hello.

During download with wget I've redirected output into file with the
following command: 

$ LC_ALL=C wget -o output 
'ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz'

I've set LC_ALL and LANG explicitly to be sure that this is not locale
related problem. The output I saw in output file was:


--2008-10-25 14:51:17--  
ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz
   => `OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13'
Resolving mirror.yandex.ru... 77.88.19.68
Connecting to mirror.yandex.ru|77.88.19.68|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.==> PWD ... done.
==> TYPE I ... done.  ==> CWD /gentoo-distfiles/distfiles ... done.
==> SIZE OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... 13633213
==> PASV ... done.==> RETR 
OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... done.
Length: 13633213 (13M)

 0K .. .. .. .. ..  0%  131K 1m41s
50K .. .. .. .. ..  0%  132K 1m40s
   100K .. .. .. .. ..  1%  135K 99s
   150K .. .. .. .. ..  1%  132K 99s
   200K .. .. .. .. ..  1%  130K 99s
   250K .. .. .. .. ..  2% 45.9K 2m9s
   300K .. .. .. .. ..  2% 64.3M 1m50s
[snip]
 13250K .. .. .. .. .. 99%  131K 0s
 13300K .. ...100%  134K=1m41s

2008-10-25 14:52:58 (132 KB/s) - 
`OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13' saved [13633213]


Note the line above snip:
   300K ..  2% 64.3M 1m50s

This is impossible to download so much Mbytes as file is much less. I
don't know why sometimes this number jumps, but in some cases it cause
the following output at the end of download:

 13300K .. ...  100% 26101G=1m45s

Obviously I don't have possibility to download with such high
(26101G=1m45s) speed. This is reproducible with wget 1.11.4.

-- 
Peter.

--mirror and --cut-dirs=2 bug?

2008-10-24 Thread Brock Murch

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I try to keep a mirror of NASA atteph ancilliary data for modis processing. I 
know that means little, but I have a cron script that runs 2 times a day. 
Sometimes it works, and others, not so much. The sh script is listed at the 
end of this email below. As is the contents of the remote ftp server's root 
and portions fo the log. 

I don't need all the data on the remote server, only some thus I use 
- --cut-dirs. To make matters stranger, the software (also from NASA) that uses 
these files, looks for them in a single place on the client machine where the 
software runs, but needs data from 2 different directories on the remote ftp 
server. If the data is not on the client machine, the software kindly ftp's 
the files to the local directory. However, I don't allow write access to that 
directory as many people use the software and when it is d/l'ed it has the 
wrong perms for others to use it, thus I mirror the data I need from the ftp 
site locally. In the script below, there are 2 wget commands, but they are to 
slightly different directories (MODISA & MODIST).

It appears to me that the problem occurs if there is a ftp server error, and 
wget starts a retry. wget goes to the server root, gets the .listing from 
there for some reason (as opposed to the directory it should go to on the 
server), and then goes to the dir it needs to mirror and can't find the files 
(that are listed in the root dir) and creates dirs, and then I get "No such 
file" errors and recursive directories created. Any advice would be 
appreciated.

Brock Murch

Here is an example of the "bad" type of dir structure I end up with (there 
should be no EO1 and below):

[EMAIL PROTECTED] atteph]# find . -type d -name "*" | grep EO1
./2002/110/EO1
./2002/110/EO1/CZCS
./2002/110/EO1/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS

Or:
[EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/
CZCS  README
[EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/
COMMON
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/

And

[EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README 
ls: /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README: No 
such file or directory
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README


All the README files are all the same, and the same as the one is the ftp 
s

Re: Hello, All and bug #21793

2008-09-08 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Coon wrote:
> Hello everyone,
> 
> I thought I'd introduce myself to you all, as I intend to start helping
> out with wget.  This will be my first time contributing to any kind of
> free or open source software, so I may have some basic questions down
> the line about best practices and such, though I'll try to keep that to
> a minimum.
> 
> Anyway, I've been researching unicode and utf-8 recently, so I'm gonna
> try to tackle bug #21793 <https://savannah.gnu.org/bugs/?21793>. 

Hi David, and welcome!

If you haven't already, please see
http://wget.addictivecode.org/HelpingWithWget

I'd encourage you to get a Savannah account, so I can assign that bug to
you. Also, I tend to hang out quite a bit on IRC (#wget @
irc.freenode.net), so you might want to sign on there.

Since you mentioned an interest in Unicode and UTF-8, you might want to
check out Saint Xavier's recent work on IRI and iDNS support in Wget,
which is available at http://hg.addictivecode.org/wget/sxav/.

Among other things, sxav's additions make Wget more aware of the user's
locale, so it might be useful for providing a feature to automatically
transcode filenames to the user's locale, rather than just supporting
UTF-8 only (which should still probably remain an explicit option). If
that sounds like the direction you'd like to take it, you should
probably base your work on sxav's repository, rather than mainline.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD
veVZAIH2NjbYI8dG6DimjRg=
=9Qau
-END PGP SIGNATURE-

Hello, All and bug #21793

2008-09-08 Thread David Coon

Hello everyone,

I thought I'd introduce myself to you all, as I intend to start helping out
with wget.  This will be my first time contributing to any kind of free or
open source software, so I may have some basic questions down the line about
best practices and such, though I'll try to keep that to a minimum.

Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try
to tackle bug #21793 <https://savannah.gnu.org/bugs/?21793>.

-David A Coon

Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-02 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, that's what it means.

I'm not yet committed to doing this. I'd like to see first how many
mainstream servers will respect If-Modified-Since when given as part of
an HTTP/1.0 request (in comparison to how they respond when it's part of
an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not
in HTTP/1.1, that'd be an excellent case for holding off until we're
doing HTTP/1.1 requests.

Also, I don't think "removing the previous HEAD request" code is
entirely accurate: we probably would want to detect when a server is
feeding us non-new content in response to If-Modified-Since, and adjust
to use the current HEAD method instead as a fallback.

- -Micah

vinothkumar raman wrote:
> This mean we should remove the previous HEAD request code and use
> If-Modified-Since by default and have it to handle all the request and
> store pages if it is not returning a 304 response
> 
> Is it so?
> 
> 
> On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
>> Follow-up Comment #4, bug #20329 (project wget):
>>
>> verbatim-mode's not all that readable.
>>
>> The gist is, we should go ahead and use If-Modified-Since, perhaps even now
>> before there's true HTTP/1.1 support (provided it works in a reasonable
>> percentage of cases); and just ensure that any Last-Modified header is sane.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2
8JiRBKtEhmcK3schVVO347A=
=yCJV
-END PGP SIGNATURE-

Re: [BUG:#20329] If-Modified-Since support

2008-09-02 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
> We need to give out the time stamp the local file in the Request
> header for that we need to pass on the local file's time stamp from
> http_loop() to get_http() . The only way to pass on this without
> altering the signature of the function is to add a field to struct url
> in url.h
> 
> Could we go for it?

That is acceptable.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1
AFkIYSyyyS4egbyXjzBLXBo=
=fIT5
-END PGP SIGNATURE-

Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-01 Thread vinothkumar raman

This mean we should remove the previous HEAD request code and use
If-Modified-Since by default and have it to handle all the request and
store pages if it is not returning a 304 response

Is it so?


On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
>
> Follow-up Comment #4, bug #20329 (project wget):
>
> verbatim-mode's not all that readable.
>
> The gist is, we should go ahead and use If-Modified-Since, perhaps even now
> before there's true HTTP/1.1 support (provided it works in a reasonable
> percentage of cases); and just ensure that any Last-Modified header is sane.
>
>___
>
> Reply to this item at:
>
>  <http://savannah.gnu.org/bugs/?20329>
>
> ___
>  Message sent via/by Savannah
>  http://savannah.gnu.org/
>
>

[BUG:#20329] If-Modified-Since support

2008-09-01 Thread vinothkumar raman

Hi all,

We need to give out the time stamp the local file in the Request
header for that we need to pass on the local file's time stamp from
http_loop() to get_http() . The only way to pass on this without
altering the signature of the function is to add a field to struct url
in url.h

Could we go for it?

Thanks,
VinothKumar.R

BUG : 202329 IF-MODIFIED-SINCE

2008-09-01 Thread vinothkumar raman

Hi all,

We need to give out the time stamp the local file in the Request
header for that we need to pass on the local file's time stamp from
http_loop() to get_http() . The only way to pass on this without
altering the signature of the function is to add a field to struct url
in url.h

Could we go for it?

Thanks,
VinothKumar.R

RE: wget-1.11.4 bug

2008-07-26 Thread kuang-cheng chao


Micah Cowan wrote:> The thing is, though, those two threads should be running 
wgets under> separate processes
 
Yes, the two threads are running wgets under seperate processes with "system".
> What operating system are you running? Vista?mipsel-linux with kernel v2.4 
> built from gcc v3.3.5 
 
Best regards,
K.C. Chao
_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE

Re: wget-1.11.4 bug

2008-07-25 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

k.c. chao wrote:
> Micah Cowan wrote:
> > Have you reproduced this, or is this in theory? If the latter, what has
> > led you to this conclusion? I don't see anything in the code that would
> > cause this behavior.
>
> I reproduce this. But I can't make sure the really problem is in
> "resolve_bind_address." In the attached message, both
> api.yougotphogo.com and farm1.static.flickr.com get the same
> ip(74.124.203.218).  The two wget are called from two threads of a
> program.

Yeah, I get 68.142.213.135 for the flickr.com address, currently.

The thing is, though, those two threads should be running wgets under
separate processes (I'm not sure how they couldn't be, but if they
somehow weren't that would be using Wget other than how it was designed
to be used).

This problem sounds much more like an issue with the OS's API than an
issue with Wget, to me. But we'd still want to work around it if it were
feasible.

What operating system are you running? Vista?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIirT17M8hyUobTrERAjsuAJ0crMPYIQficu1csou8Tt0jDFKvpQCeNYk3
1FhXl3uUYj2IA53qI1oOJ8A=
=DvdG
-END PGP SIGNATURE-

RE: wget-1.11.4 bug

2008-07-25 Thread kuang-cheng chao


Micah Cowan wrote:
> Have you reproduced this, or is this in theory? If the latter, what has> led 
> you to this conclusion? I don't see anything in the code that would> cause 
> this behavior.
I reproduce this. But I can't make sure the really problem is in 
"resolve_bind_address."
In the attached message, both api.yougotphogo.com and farm1.static.flickr.com 
get the same ip(74.124.203.218).
The two wget are called from two threads of a program.
 
Best regards,
k.c. chao
 
p.s. 
 
The log is follworing:
 
wget -4 -t 6 
"http://api.yougotphoto.com/device/?action=get_device_new_photo&api=2.2&api_key=f10df554a958fd10050e2d305241c7a3&device_class=2&serial_no=000E2EE5676F&url_no=24616&cksn=44fe191d6cb4e7807f75938b5d72f07c";
 -O /tmp/webii/ygp_new_photo_list.txt--1999-11-30 00:04:21--  
http://api.yougotphoto.com/device/?action=get_device_new_photo&api=2.2&api_key=f10df554a958fd10050e2d305241c7a3&device_class=2&serial_no=000E2EE5676F&url_no=24616&cksn=44fe191d6cb4e7807f75938b5d72f07cResolving
 api.yougotphoto.com... wget -4 -t 6 
"http://farm1.static.flickr.com/33/49038824_e4b04b7d9f_b.jpg"; -O 
/tmp/webii/24616 74.124.203.218Connecting to 
api.yougotphoto.com|74.124.203.218|:80... --1999-11-30 00:04:22--  
http://farm1.static.flickr.com/33/49038824_e4b04b7d9f_b.jpgResolving 
farm1.static.flickr.com... 74.124.203.218Connecting to 
farm1.static.flickr.com|74.124.203.218|:80... connected. 
_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE

Re: wget-1.11.4 bug

2008-07-25 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

kuang-cheng chao wrote:
> Dear Micah:
>  
> Thanks for your work of wget.
>  
> There is a question about two wgets run simultaneously.
> In method resolve_bind_address, wget assumes that this is called once.
> However, this will cause two domain name with the same ip if two wgets
> run the same method concurrently.

Have you reproduced this, or is this in theory? If the latter, what has
led you to this conclusion? I don't see anything in the code that would
cause this behavior.

Also, please use the mailing list for discussions about Wget. I've added
it to the recipients list.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIiYKF7M8hyUobTrERAr7fAJ0TnkLdEVOMy6wJA3Z1kIYC7dQoMACfZ9hb
x5K6MTzhgVRCdKJwUGnbSRw=
=EcFC
-END PGP SIGNATURE-

Re: WGET bug...

2008-07-11 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

HARPREET SAWHNEY wrote:
> Hi,
> 
> Thanks for the prompt response.
> 
> I am using
> 
> GNU Wget 1.10.2
> 
> I tried a few things on your suggestion but the problem remains.
> 
> 1. I exported the cookies file in Internet Explorer and specified
> that in the Wget command line. But same error occurs.
> 
> 2. I have an open session on the site with my username and password.
> 
> 3. I also tried running wget while I am downloading a file from the
> IE session on the site, but the same error.

Sounds like you'll need to get the appropriate cookie by using Wget to
login to the website. This requires site-specific information from the
user-login form page, though, so I can't help you without that.

If you know how to read some HTML, then you can find the HTML form used
for posting username/password stuff, and use

wget --keep-session-cookies --save-cookies=cookies.txt \
- --post-data='username=foo&password=bar' ACTION

Where ACTION is the value of the form's action field, USERNAME and
PASSWORD (and possibly further required values) are field names from the
HTML form, and FOO and BAR is the username/password.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFId+w97M8hyUobTrERAmLsAJ91231iGeO/albrgRuuUCRp8zFcnwCgiX3H
fDp2J2oTBKlxW17eQ2jaCAA=
=Khmi
-END PGP SIGNATURE-

Re: WGET bug...

2008-07-11 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

HARPREET SAWHNEY wrote:
> Hi,
> 
> I am getting a strange bug when I use wget to download a binary file
> from a URL versus when I manually download.
> 
> The attached ZIP file contains two files:
> 
> 05.upc --- manually downloaded
> dum.upc--- downloaded through wget
> 
> wget adds a number of ascii characters to the head of the file and seems
> to delete a similar number from the tail.
> 
> So the file sizes are the same but the addition and deletion renders
> the file useless.
> 
> Could you please direct me on if I should be using some specific
> option to avoind this problem?

In the future, it's useful to mention which version of Wget you're using.

The problem you're having is that the server is adding the extra HTML at
the front of your session, and then giving you the file contents anyway.
It's a bug in the PHP code that serves the file.

You're getting this extra content because you are not logged in when
you're fetching it. You need to have Wget send a cookie with an
login-session information, and then the server will probably stop
sending the corrupting information at the head of the file. The site
does not appear to use HTTP's authentication mechanisms, so the
<[EMAIL PROTECTED]> bit in the URL doesn't do you any good. It uses
Forms-and-cookies authentication.

Hopefully, you're using a browser that stores its cookies in a text
format, or that is capable of exporting to a text format. In that case,
you can just ensure that you're logged in in your browser, and use the
- --load-cookies= option to Wget to use the same session
information.

Otherwise, you'll need to use --save-cookies with Wget to simulate the
login form post, which is tricky and requires some understanding of HTML
Forms.

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFId9Vy7M8hyUobTrERAjCWAJ9niSjC5YdBDNcAbnBFWZX6D8AO7gCeM8nE
i8jn5i5Y6wLX1g3Q2hlDgcM=
=uOke
-END PGP SIGNATURE-

Re: bug in wget

2008-06-14 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sir Vision wrote:
> Hello,
> 
> enterring following command results in an error:
> 
> --- command start ---
> c:\Downloads\wget_v1.11.3b>wget
> "ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/";
> -P c:\Downloads\
> --- command end ---
> 
> wget cant convert ".listing"-file into a "html"-file

As this seems to work fine on Unix, for me, I'll have to leave it to the
Windows porting guy (hi Chris!) to find out what might be going wrong.

...however, it would really help if you would supply the full output you
got, from wget, that leads you to believe Wget couldn't do this
conversion. in fact, it wouldn't hurt to supply the -d flag as well, for
maximum debugging messages.

- --
Cheers,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B
dz38DW8jMMZtUxc+FhvIhfI=
=T+mK
-END PGP SIGNATURE-

bug in wget

2008-06-14 Thread Sir Vision


Hello,

enterring following command results in an error:

--- command start ---
c:\Downloads\wget_v1.11.3b>wget 
"ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/";
 
-P c:\Downloads\
--- command end ---

wget cant convert ".listing"-file into a "html"-file

regards


_
Keine Mail mehr verpassen! Jetzt gibt’s Hotmail fürs Handy!
http://www.gowindowslive.com/minisites/mail/mobilemail.aspx?Locale=de-de

Re: .listing bug when using -c

2008-04-03 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Karsten Hopp wrote:
> wget-1.11.1 (and 1.10/1.10.1) don't handle the .listing file properly
> when -c is used.
> It just appends to that file instead of replacing it which means that
> wget tries to download each
> file twice when you run the same command twice. Have a look at this log:
> 
>>wget -m -nd -c ftp://ftp.redhat.com/pub/redhat/linux/rawhide/

Looks like the -m is important too. Note that -m is the equivalent of:
  -r -l inf --no-remove-listing

If you replace -m with -r -l inf (no --no-remove-listing), it works
fine, so fortunately there's a workaround.

I'm surprised that this bug has been around so very long, and yet I
haven't seen or heard of it before now (perhaps previous maintainers did?).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH9Qv97M8hyUobTrERArmaAJ0cOxsCm93Na/3mEmpu8PSavekO2wCgjmhN
0ZD4Cd4Q5SH1+WSYbGktsII=
=OPIv
-END PGP SIGNATURE-

.listing bug when using -c

2008-04-03 Thread Karsten Hopp


wget-1.11.1 (and 1.10/1.10.1) don't handle the .listing file properly when -c 
is used.
It just appends to that file instead of replacing it which means that wget 
tries to download each
file twice when you run the same command twice. Have a look at this log:

>wget -m -nd -c ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
--2008-04-03 15:30:17--  ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
   => `.listing'
Resolving ftp.redhat.com... 209.132.176.30
Connecting to ftp.redhat.com|209.132.176.30|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.==> PWD ... done.
==> TYPE I ... done.  ==> CWD /pub/redhat/linux/rawhide ... done.
==> PASV ... done.==> LIST ... done.

[ <=>] 259 --.-K/s   in 0s

2008-04-03 15:30:19 (1.66 MB/s) - `.listing' saved [259]

Already have correct symlink .message -> README

--2008-04-03 15:30:19--  ftp://ftp.redhat.com/pub/redhat/linux/rawhide/README
   => `README'
==> CWD not required.
==> PASV ... done.==> RETR README ... done.
Length: 404

100%[===>] 404 --.-K/s   in 0.007s

2008-04-03 15:30:21 (59.4 KB/s) - `README' saved [404]

FINISHED --2008-04-03 15:30:21--
Downloaded: 2 files, 663 in 0.007s (95.3 KB/s)

>cat .listing
drwxr-xr-x2 ftp  ftp  4096 Nov 10  2003 .
drwxr-xr-x8 ftp  ftp  4096 May 15  2006 ..
lrwxrwxrwx1 ftp  ftp 6 Nov 10  2003 .message -> README
-rw-r--r--1 ftp  ftp   404 Nov 10  2003 README

>wget -m -nd -c ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
--2008-04-03 15:30:26--  ftp://ftp.redhat.com/pub/redhat/linux/rawhide/
   => `.listing'
Resolving ftp.redhat.com... 209.132.176.30
Connecting to ftp.redhat.com|209.132.176.30|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.==> PWD ... done.
==> TYPE I ... done.  ==> CWD /pub/redhat/linux/rawhide ... done.
==> PASV ... done.==> LIST ... done.

100%[++=>] 518 --.-K/s   in 0s

2008-04-03 15:30:28 (2.36 MB/s) - `.listing' saved [518]

Already have correct symlink .message -> README

Remote file no newer than local file `README' -- not retrieving.
Already have correct symlink .message -> README

Remote file no newer than local file `README' -- not retrieving.
FINISHED --2008-04-03 15:30:28--
Downloaded: 1 files, 518 in 0s (4.73 MB/s)

>cat .listing
drwxr-xr-x2 ftp  ftp  4096 Nov 10  2003 .
drwxr-xr-x8 ftp  ftp  4096 May 15  2006 ..
lrwxrwxrwx1 ftp  ftp 6 Nov 10  2003 .message -> README
-rw-r--r--1 ftp  ftp   404 Nov 10  2003 README
drwxr-xr-x2 ftp  ftp  4096 Nov 10  2003 .
drwxr-xr-x8 ftp  ftp  4096 May 15  2006 ..
lrwxrwxrwx1 ftp  ftp 6 Nov 10  2003 .message -> README
-rw-r--r--1 ftp  ftp   404 Nov 10  2003 README



This happens only when -c is used.




   Karsten

Re: Bug

2008-03-03 Thread Mark Pors

ok, thanks for your reply
We have a work-around in place now, but it doesnt scale very good.
Anyways, I'll start looking for another solution

Thanks!
Mark


On Sat, Mar 1, 2008 at 10:15 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
>  Hash: SHA1
>
>
>
>  Mark Pors wrote:
>  > Hi,
>  >
>  > I posted this bug over two years ago:
>  > http://marc.info/?l=wget&m=113252747105716&w=4
>  >>From the release notes I see that this is still not resolved. Are
>  > there any plans to fix this any time soon?
>
>  I'm not sure that's a bug. It's more of an architectural choice.
>
>  Wget currently works by downloading a file, then, if it needs to look
>  for links in that file, it will open it and scan through it. Obviously,
>  it can't do that when you use -O -.
>
>  There are plans to move Wget to a more stream-like process, where it
>  scans links during download. At such time, it's very possible that -p
>  will work the way you want it to. In the meantime, though, it doesn't.
>
>  - --
>  Micah J. Cowan
>  Programmer, musician, typesetting enthusiast, gamer...
>  http://micah.cowan.name/
>  -BEGIN PGP SIGNATURE-
>  Version: GnuPG v1.4.6 (GNU/Linux)
>  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
>  iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9
>  u646lF2Qp0abOw3iuvD0ohg=
>  =Cix9
>  -END PGP SIGNATURE-
>

Re: Bug

2008-03-01 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mark Pors wrote:
> Hi,
> 
> I posted this bug over two years ago:
> http://marc.info/?l=wget&m=113252747105716&w=4
>>From the release notes I see that this is still not resolved. Are
> there any plans to fix this any time soon?

I'm not sure that's a bug. It's more of an architectural choice.

Wget currently works by downloading a file, then, if it needs to look
for links in that file, it will open it and scan through it. Obviously,
it can't do that when you use -O -.

There are plans to move Wget to a more stream-like process, where it
scans links during download. At such time, it's very possible that -p
will work the way you want it to. In the meantime, though, it doesn't.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9
u646lF2Qp0abOw3iuvD0ohg=
=Cix9
-END PGP SIGNATURE-

Bug

2008-03-01 Thread Mark Pors

Hi,

I posted this bug over two years ago:
http://marc.info/?l=wget&m=113252747105716&w=4
>From the release notes I see that this is still not resolved. Are
there any plans to fix this any time soon?

Thanks
Mark

Re: bug on wget

2007-11-21 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
> Generally, if Wget considers a header to be in error (and hence
> ignores it), the user probably needs to know about that.  After all,
> it could be the symptom of a Wget bug, or of an unimplemented
> extension the server generates.  In both cases I as a user would want
> to know.  Of course, Wget should continue to be lenient towards syntax
> violations widely recognized by popular browsers.
> 
> Note that I'm not arguing that Wget should warn in this particular
> case.  It is perfectly fine to not consider an empty `Set-Cookie' to
> be a syntax error and to simply ignore it (and maybe only print a
> warning in debug mode).

That was my thought. I agree with both of your points above: if Wget's
not handling something properly, I want to know about it; but at the
same time, silently ignoring (erroneous) empty headers doesn't seem like
a problem.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHRGqx7M8hyUobTrERCPwQAJ4wGFwPBqyoVDXjrOifNB/fVF1vtACbBnDU
fnSx/Vj+S+DVnfRUbIz5HKU=
=n4yr
-END PGP SIGNATURE-

Re: bug on wget

2007-11-21 Thread Hrvoje Niksic

Micah Cowan <[EMAIL PROTECTED]> writes:

>> The new Wget flags empty Set-Cookie as a syntax error (but only
>> displays it in -d mode; possibly a bug).
>
> I'm not clear on exactly what's possibly a bug: do you mean the fact
> that Wget only calls attention to it in -d mode?

That's what I meant.

> I probably agree with that behavior... most people probably aren't
> interested in being informed that a server breaks RFC 2616 mildly;

Generally, if Wget considers a header to be in error (and hence
ignores it), the user probably needs to know about that.  After all,
it could be the symptom of a Wget bug, or of an unimplemented
extension the server generates.  In both cases I as a user would want
to know.  Of course, Wget should continue to be lenient towards syntax
violations widely recognized by popular browsers.

Note that I'm not arguing that Wget should warn in this particular
case.  It is perfectly fine to not consider an empty `Set-Cookie' to
be a syntax error and to simply ignore it (and maybe only print a
warning in debug mode).

Re: bug on wget

2007-11-20 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
> Micah Cowan <[EMAIL PROTECTED]> writes:
> 
>> I was able to reproduce the problem above in the release version of
>> Wget; however, it appears to be working fine in the current
>> development version of Wget, which is expected to release soon as
>> version 1.11.*
> 
> I think the old Wget crashed on empty Set-Cookie headers.  That got
> fixed when I converted the Set-Cookie parser to use extract_param.
> The new Wget flags empty Set-Cookie as a syntax error (but only
> displays it in -d mode; possibly a bug).

I'm not clear on exactly what's possibly a bug: do you mean the fact
that Wget only calls attention to it in -d mode?

I probably agree with that behavior... most people probably aren't
interested in being informed that a server breaks RFC 2616 mildly;
especially if it's not apt to affect the results. Unless of course the
user was expecting that the user send a real cookie, but I'm guessing
that this only happens when the server doesn't have one to send (or
something). But a user in that situation should be using -d (or at least
- -S) to find out what the server is sending.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQ3N97M8hyUobTrERCCpFAJ9RHcdJ8X4UWpEQIhz+khDWc8MOJwCfZANU
vr2lCTLP04R/PP/cBf7sIpE=
=6csr
-END PGP SIGNATURE-

Re: bug on wget

2007-11-20 Thread Hrvoje Niksic

Micah Cowan <[EMAIL PROTECTED]> writes:

> I was able to reproduce the problem above in the release version of
> Wget; however, it appears to be working fine in the current
> development version of Wget, which is expected to release soon as
> version 1.11.*

I think the old Wget crashed on empty Set-Cookie headers.  That got
fixed when I converted the Set-Cookie parser to use extract_param.
The new Wget flags empty Set-Cookie as a syntax error (but only
displays it in -d mode; possibly a bug).

Re: bug on wget

2007-11-20 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Diego Campo wrote:
> Hi,
> I got a bug on wget when executing:
> 
> wget -a log -x -O search/search-1.html --verbose --wait 3
> --limit-rate=20K --tries=3
> http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1
> 
> Segmentation fault (core dumped)

Hi Diego,

I was able to reproduce the problem above in the release version of
Wget; however, it appears to be working fine in the current development
version of Wget, which is expected to release soon as version 1.11.*

* Unfortunately, it has been "expected to release soon" for a few months
now; we got hung up with some legal/licensing issues that are yet to be
resolved. It will almost certainly be released in the next few weeks,
though.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQypR7M8hyUobTrERCF99AJ4w790h4juXzPwO+csBbSY3KcLOXACdGYgO
Kf4Oawgfjx6WOEzYwkQ47mw=
=8gL2
-END PGP SIGNATURE-

bug on wget

2007-11-20 Thread Diego Campo

Hi,
I got a bug on wget when executing:

wget -a log -x -O search/search-1.html --verbose --wait 3
--limit-rate=20K --tries=3
http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1

Segmentation fault (core dumped)


I created directory "search". 
The above creates a file search/search-1.html zero-sized.
Logfile "log":

Resolviendo www.nepremicnine.net... 212.103.144.204
Conectando a www.nepremicnine.net|212.103.144.204|:80... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
--18:18:28--
http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1
   => `search/search-1.html'

(I hope you understand the Spanish above. If not, labels are the usual:
"resolving", "connecting", "HTTP petition sent", "waiting for request")

It happens the same when varying the parameter on the url "id_regije",
just in case it helps.

I'm using Intel CoreDuo E6300, plenty of disk/mem space.
ubuntu 7.10

Should you need any further information don't hesitate to contact.
Regards
 Diego

Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:

> In the long run, supporting something like IRL is surely the right
> thing to go for, but I have a feeling that we'll be stuck with the
> current messy URLs for quite some time to come.  So Wget simply needs
> to adapt to the current circumstances.  If the locale includes "UTF-8"
> in any shape or form, it is perfectly safe to assume that it's valid
> to create UTF-8 file names.  Of course, we don't know if a particular
> URL path sequence is really meant to be UTF-8, but there should be no
> harm in allowing valid UTF-8 sequences to pass through.  In other
> words, the default "quote control" policy could simply be smarter
> about what "control" means.

That's true. I had been thinking I'd just deal with it all together, but
there's no reason why we couldn't adjust what "control characters" are
based on the locale today. Still, I think it's a low-priority enough
issue (given that there are work-arounds), that I may save it to address
all in one lump.

BTW, there's a related discussion at
https://savannah.gnu.org/bugs/index.php?20863, though that one is
regarding  translating between the current locale and Unicode (for
command-line arguments) and back again (for file names).

> One consequence would be that Wget creates differently-named files in
> different locales, but it's probably a reasonable price to pay for not
> breaking an important expectation.  Another consequence would be
> making users open to IDN homograph attacks, but I don't know if that's
> a problem in the context of creating file names (IDN is normally
> defined as a misrepresentation of who you communicate with).

Aren't we already open to this? That is, if someone directs us to
www.microsoft.com, where the "o" of "soft" is replaced by its look-alike
in cyrillic, and our DNS server happens to respect IDNs represented
literally (instead of translated into the ASCII "punycode" format, as
they will be when we support IDNs properly), that "o" in UTF-8 would be
0xD0 0xBE, and so wouldn't get percent-encoded on the way in.

One way of dealing with this when we _do_ translate to punycode, would
be to keep the punycode version for creation of the "hostname"
directory. Though that could be ugly in practice, at least for
especially non-latin domain names.

The best way of dealing with homographs, though, is to only use IRIs
from trusted sources (usually: type them in).

> It could be made to recognize UTF-8 character
> sequences in UTF-8 locales and exempt valid UTF-8 chars from being
> treated as "control" characters.  Invalid UTF-8 chars would still pass
> all the checks, and non-canonical UTF-8 sequences would be "rejected"
> (by condemning their byte values to being escaped as %..).  This is
> not much work for someone who understands the basics of UTF-8.

Right. If the high-bit isn't set, it's ASCII; if it is set, then you can
tell by context which high-bits ought to be set in its neighbors.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBWZj7M8hyUobTrERCEL5AJ9Yh7ctcpGUus67WeakvUxzzcbR6wCfV42N
9LXCwW29U/S5QclTl9UTSGg=
=Giga
-END PGP SIGNATURE-

Re: bug in escaped filename calculation?

2007-10-04 Thread Hrvoje Niksic

Micah Cowan <[EMAIL PROTECTED]> writes:

> It is actually illegal to specify byte values outside the range of
> ASCII characters in a URL, but it has long been historical practice
> to do so anyway. In most cases, the intended meaning was one of the
> latin character sets (usually latin1), so Wget was right to do as it
> does, at that time.

Your explanation is spot-on.  I would only add that Wget's
interpretation of what is a "control" character is not so much geared
toward Latin 1 as it is geared toward maximum safety.  Originally I
planned to simply encode *all* file name characters outside the 32-127
range, but in practice it was very annoying (not to mention
US-centric) to encode perfectly valid Latin 1/2/3/... as %xx.  Since
the codes 128-159 *are* control characters (in those charsets) that
can mess up your screen and that you wouldn't want seen by default, I
decided to encode them by default, but allow for a way to turn it off,
in case someone used a different charset.

In the long run, supporting something like IRL is surely the right
thing to go for, but I have a feeling that we'll be stuck with the
current messy URLs for quite some time to come.  So Wget simply needs
to adapt to the current circumstances.  If the locale includes "UTF-8"
in any shape or form, it is perfectly safe to assume that it's valid
to create UTF-8 file names.  Of course, we don't know if a particular
URL path sequence is really meant to be UTF-8, but there should be no
harm in allowing valid UTF-8 sequences to pass through.  In other
words, the default "quote control" policy could simply be smarter
about what "control" means.

One consequence would be that Wget creates differently-named files in
different locales, but it's probably a reasonable price to pay for not
breaking an important expectation.  Another consequence would be
making users open to IDN homograph attacks, but I don't know if that's
a problem in the context of creating file names (IDN is normally
defined as a misrepresentation of who you communicate with).

For those who want to hack on this, the place to look at is
url.c:append_uri_pathel; that strangely-named function takes a path
element (a directory name or file name component of the URL) and
appends it to the file name.  It takes care not to ever use ".." as a
path component and to respect the --restrict-file-names setting as
specified by the user.  It could be made to recognize UTF-8 character
sequences in UTF-8 locales and exempt valid UTF-8 chars from being
treated as "control" characters.  Invalid UTF-8 chars would still pass
all the checks, and non-canonical UTF-8 sequences would be "rejected"
(by condemning their byte values to being escaped as %..).  This is
not much work for someone who understands the basics of UTF-8.

Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Brian Keck wrote:
> Hello,
> 
> I'm wondering if I've found a bug in the excellent wget.
> I'm not asking for help, because it turned out not to be the reason
> one of my scripts was failing.
> 
> The possible bug is in the derivation of the filename from a URL which
> contains UTF-8.
> 
> The case is:
> 
>   wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk
> 
> Of course these are all ascii characters, but underlying it are
> 3 nonascii characters, whose UTF-8 encoding is:
> 
>   hexoctal name
>     ---  -
>   C387  303 274  C-cedilla
>   C3B6  303 266  o-umlaut
>   C3BC  303 274  u-umlaut
> 
> The file created has a name that's almost, but not quite, a valid UTF-8
> bytestring ... 
> 
>   ls *y*k | od -tc
>   000 303   %   8   7   a   t   a   l   h 303 266   y 303 274   k  \n
> 
> Ie the o-umlaut & u-umlaut UTF-8 encodings occur in the bytestring,
> but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the
> 3-byte string "%87".

Using --restrict=nocontrol will do what you want it to, in this instance.

> I'm guessing this is not intended.  

Actually, it is (more-or-less).

Realize that Wget really has no idea how to tell whether you're trying
to give it UTF-8, or one of the ISO latin charsets. It tends to assume
the latter. It also, by default, will not create filenames with control
characters in them. In ISO latin, characters in the range 0x80-0x9f are
control characters, which is why Wget left %87 escaped, which falls into
that range, but not the others, which don't.

It is actually illegal to specify byte values outside the range of ASCII
characters in a URL, but it has long been historical practice to do so
anyway. In most cases, the intended meaning was one of the latin
character sets (usually latin1), so Wget was right to do as it does, at
that time.

There is now a standard for representing Unicode values in URLs, whose
result is then called IRLs (Internationalized Resource Locators).
Conforming correctly to this standard would require that Wget be
sensitive to the context and encoding of documents in which it finds
URLs; in the case of filenames and command arguments, it would probably
also require sensitivity to the current locale as determined by
environment variables. Wget is simply not equipped to handle IRLs or
encoding issues at the moment, so until it is, a proper fix will not be
in place. Addressing these are considered a "Wget 2.0" (next-generation
Wget functionality) priority, and probably won't be done for a year or
two, given that the number of developers involved with Wget, if you add
up all the part-time helpers (including me), is probably still less than
one full-time dev. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBSHX7M8hyUobTrERCKRLAJwKiDOo0uO7x/k/iAEB/W0pPQmUJQCfUHaP
c6k2490strgy1Efy1DmiOhA=
=7lvZ
-END PGP SIGNATURE-

Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
> On 10/4/07, Brian Keck <[EMAIL PROTECTED]> wrote:
>> I would have sent a fix too, but after finding my way through http.c &
>> retr.c I got lost in url.c.
> 
> You and me both. A lot of the code needs re-written.. there's a lot of
> spaghetti code in there. I hope Micah chooses to do a complete
> re-write for version 2 so I can get my hands dirty and understand the
> code better.

Currently, I'm planning on refactoring what exists, as needed, rather
than going for a complete rewrite. This will be driven by unit-tests, to
try to ensure that we do not lose functionality along the way. This
involves more work overall, but IMO has these key advantages:

 * as mentioned, it's easier to prevent functionality loss,
 * we will be able to use the work as its written, instead of waiting
many months for everything to be finished (especially with the current
number of developers), and
 * AIUI, the wording of employer copyright assignment releases may not
apply to new works that are not _preexisting_ as GPL works. This means
that, if a rewrite ended up using no code whatsoever from the original
work (not likely, but...), there could be legal issues.

After 1.11 is released (or possibly before), one of my top priorities is
to clean up the gethttp and http_loop functions to a degree where they
can be much more readily read and understood (and modified!). This is
important to me because so far (in my
probably-not-statistically-significant 3 months as maintainer) a
majority of the trickier fixes have been in those two functions. Some of
these fixes seem to frequently introduce bugs of their own, and I spend
more time than seems right in trying to understand the code there, which
is why these particular functions are prime targets for refactoring. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBR7E7M8hyUobTrERCCrbAJ9Jw7LB/YW4myDOyPiHvXLZ13rkNQCeOVbf
5INV0ApmUTuzxp8zO5haVCA=
=EeEd
-END PGP SIGNATURE-

Re: bug in escaped filename calculation?

2007-10-04 Thread Josh Williams

On 10/4/07, Brian Keck <[EMAIL PROTECTED]> wrote:
> I would have sent a fix too, but after finding my way through http.c &
> retr.c I got lost in url.c.

You and me both. A lot of the code needs re-written.. there's a lot of
spaghetti code in there. I hope Micah chooses to do a complete
re-write for version 2 so I can get my hands dirty and understand the
code better.

bug in escaped filename calculation?

2007-10-04 Thread Brian Keck


Hello,

I'm wondering if I've found a bug in the excellent wget.
I'm not asking for help, because it turned out not to be the reason
one of my scripts was failing.

The possible bug is in the derivation of the filename from a URL which
contains UTF-8.

The case is:

  wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk

Of course these are all ascii characters, but underlying it are
3 nonascii characters, whose UTF-8 encoding is:

  hexoctal name
    ---  -
  C387  303 274  C-cedilla
  C3B6  303 266  o-umlaut
  C3BC  303 274  u-umlaut

The file created has a name that's almost, but not quite, a valid UTF-8
bytestring ... 

  ls *y*k | od -tc
  000 303   %   8   7   a   t   a   l   h 303 266   y 303 274   k  \n

Ie the o-umlaut & u-umlaut UTF-8 encodings occur in the bytestring,
but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the
3-byte string "%87".

I'm guessing this is not intended.  

I would have sent a fix too, but after finding my way through http.c &
retr.c I got lost in url.c.

Brian Keck

Re: [fwd] Wget Bug: recursive get from ftp with a port in the url fails

2007-09-17 Thread Micah Cowan

Hrvoje Niksic wrote:
> Subject:
> Re: Wget Bug: recursive get from ftp with a port in the url fails
> From:
> baalchina <[EMAIL PROTECTED]>
> Date:
> Mon, 17 Sep 2007 19:56:20 +0800
> To:
> [EMAIL PROTECTED]
> 
> To:
> [EMAIL PROTECTED]
> 
> Message-ID:
> <[EMAIL PROTECTED]>
> MIME-Version:
> 1.0
> Content-Type:
> multipart/alternative; boundary="==-=-="
> 
> 
> Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like
> Cantara. The file system is NTFS.
> Well I find my problem is, I wrote the command in schedule tasks like this:
>  
> wget  -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P
> d:\virus.update\kaspersky
>  
> well, after "wget",and before "-N", I typed TWO spaces.
>  
> After delete one space, wget works well again.
>  
> Hope this can help.
>  
> :)

Hi baalchina,

Hrvoje forwarded your message to the Wget discussion mailing list, where
such questions are really more appropriate, especially since Hrvoje is
not maintaining Wget any longer, but has left that responsibility for
others.

What you're describing does not appear to be a bug in Wget; it's the
shell's (or task scheduler's, or whatever) responsibility to split
space-separated elements properly; the words are supposed to already be
split apart (properly) by the time Wget sees it.

Also, you didn't really describe what was going wrong with Wget, or what
message about it's failure you were seeing (perhaps you'd need to
specify a log file with -o log, or via redirection of the command
interpreter supports it). However, if the problem is that Wget was
somehow seeing the space, as a separate argument or as part of another
one, then the bug lies with your task scheduler (or whatever is
interpreting the command line).

-- 
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

signature.asc
Description: OpenPGP digital signature

[fwd] Wget Bug: recursive get from ftp with a port in the url fails

2007-09-17 Thread Hrvoje Niksic

--- Begin Message ---
Hi,I am using wget 1.10.2 in Windows 2003.And the same problem like Cantara.
The file system is NTFS.
Well I find my problem is, I wrote the command in schedule tasks like this:

wget  -N -i D:\virus.update\scripts\kavurl.txt -r -nH -P
d:\virus.update\kaspersky

well, after "wget",and before "-N", I typed TWO spaces.

After delete one space, wget works well again.

Hope this can help.

:)

-- 
from:baalchina
--- End Message ---

ftp-ls.c - filesize parsing bug

2007-08-29 Thread Jason Mancini


Hello,
What the heck was this code supposed to do in ftp-ls.c?  If there is only a 
single
space between the previous token and the filesize, then "t" points at the 
NULL
character, and filesize is thought to be 0, resulting in a mismatch 
everytime.
ptok is already pointing at the start of the token, I don't understand the 
need to
try to decrement the pointer.  I commented out the two lines to fix the 
issue.

Thanks!  (ps Where is the ftp chdir bugfix?!  No wget releases...)
Jason


/* Back up to the beginning of the previous token
and parse it with str_to_wgint.  */
char *t = ptok;
while (t > line && ISDIGIT (*t)) // useless and buggy
 --t; // useless and buggy
if (t == line)

_
Learn.Laugh.Share. Reallivemoms is right place! 
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-15 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
> 
> On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:
> 
>>
>>> sprintf(filecopy, "\"%.2047s\"", file);
>>
>> This fix breaks the FTP protocol, making wget instantly stop working
>> with many conforming servers, but apparently start working with yours;
>> the RFCs are very clear that the file name argument starts right after
>> the string "RETR "; the very next character is part of the file name,
>> including if the next character is a space (or a quote). The file name
>> is terminated by the CR LF sequence (which implies that the sequence CR
>> LF may not occcur in the filename). Therefore, if you ask for a file
>> "file.txt", a conforming server will attempt to find and deliver a file
>> whose name begins and ends with double-quotes.
>>
>> Therefore, this seems like a server problem.
> 
> I think you may well be correct.  I am now unable to reproduce the
> problem where the server does not recognize a filename unless I give it
> quotes.  In fact, as you say, the server ONLY recognizes filenames
> WITHOUT quotes and quoting breaks it.  I had to revert to the non-quoted
> code to get proper behavior.  I am very confused now.  I apologize
> profusely for wasting your time.  How embarrassing!

No worries, it happens! Sometimes the tests we run go other than we
think they did. :)
> 
> I'll save this email, and if I see the behavior again, I will provide
> you with the details you requested below.

That would be terrific, thanks.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGmpOD7M8hyUobTrERCA7FAJ4oygvX7rpQy1k5FL7j3R12LUdWUACfVHrc
sk1tpS12pDYBvVbD4Nv7/I4=
=KCxk
-END PGP SIGNATURE-

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-15 Thread Josh Williams


On 7/15/07, Rich Cook <[EMAIL PROTECTED]> wrote:

I think you may well be correct.  I am now unable to reproduce the
problem where the server does not recognize a filename unless I give
it quotes.  In fact, as you say, the server ONLY recognizes filenames
WITHOUT quotes and quoting breaks it.  I had to revert to the non-
quoted code to get proper behavior.  I am very confused now.  I
apologize profusely for wasting your time.  How embarrassing!

I'll save this email, and if I see the behavior again, I will provide
you with the details you requested below.


I wouldn't say it was a waste of time. Actually, I think it's good for
us to know that this problem exists on some servers. We're considering
writing a patch to recognise servers that do not support spaces. If
the standard method fails, then it will retry as an escaped character.

Nothing has been written for this yet, but it has been discussed, and
may be implemented in the future.

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-15 Thread Rich Cook



On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:




sprintf(filecopy, "\"%.2047s\"", file);


This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string "RETR "; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the  
sequence CR

LF may not occcur in the filename). Therefore, if you ask for a file
"file.txt", a conforming server will attempt to find and deliver a  
file

whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.


I think you may well be correct.  I am now unable to reproduce the  
problem where the server does not recognize a filename unless I give  
it quotes.  In fact, as you say, the server ONLY recognizes filenames  
WITHOUT quotes and quoting breaks it.  I had to revert to the non- 
quoted code to get proper behavior.  I am very confused now.  I  
apologize profusely for wasting your time.  How embarrassing!


I'll save this email, and if I see the behavior again, I will provide  
you with the details you requested below.




Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug



--
Rich "wealthychef" Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-13 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
> On OS X, if a filename on the FTP server contains spaces, and the remote
> copy of the file is newer than the local, then wget gets thrown into a
> loop of "No such file or directory" endlessly.   I have changed the
> following in ftp-simple.c, and this fixes the error.
> Sorry, I don't know how to use the proper patch formatting, but it
> should be clear.

I and another developer could not reproduce this problem, either in the
current trunk or in wget 1.10.2.

> sprintf(filecopy, "\"%.2047s\"", file);

This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string "RETR "; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the sequence CR
LF may not occcur in the filename). Therefore, if you ask for a file
"file.txt", a conforming server will attempt to find and deliver a file
whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.

Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug

Thank you very much.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGl9KT7M8hyUobTrERCJfoAJ91z9c2GniuoaX0mj9oqzHrrpNCtQCePQnm
lvbVe0i5/jVy9V10uQpYgmk=
=iQq1
-END PGP SIGNATURE-

Re: [wget-notify] [bug #20466] --delete-after and --spider should not create (and leave) directories

2007-07-12 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Joshua David Williams wrote:
> URL:
>   

...

> Details:
> 
> This patch forces the --no-directories option if we're not actually keeping
> the files we're downloading (as in the --delete-after and --spider options).
> This way, we don't leave a mess of empty directories.

This seems like a reasonable idea, but I'd like to get some discussion
on it first.

The downside, of course, is that there's no short option to reverse the
implied -nd; they'll have to use "--directories" (at the time I was
discussing it with Josh, I'd been thinking -e would be needed, but this
seems to be untrue).

It seems to me that by far the most common intention would be not to
leave any files around; this behavior seems fairly reason to me. Thoughts?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlpx17M8hyUobTrERCKP5AJ4rHtoA7xy9FNidKS7WooTwmF5xGACfYHv2
fIwxjHVH/t3H6/xkVk4Yqio=
=ZbKt
-END PGP SIGNATURE-

Re: [bug #20323] Wget issues HEAD before GET, even when the file doesn't exist locally.

2007-07-12 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Mauro Tortonesi wrote:
> Micah Cowan ha scritto:
>> Update of bug #20323 (project wget):
>>
>>   Status:  Ready For Test => In
>> Progress   
>> ___
>>
>> Follow-up Comment #3:
>>
>> Moving back to In Progress until some questions about the logic are
>> answered:
>>
>> http://addictivecode.org/pipermail/wget-notify/2007-July/75.html
>> http://addictivecode.org/pipermail/wget-notify/2007-July/77.html
> 
> thanks micah.
> 
> i have partly misunderstood the logic behind preliminary HEAD request.
> in my code, HEAD is skipped if -O or --no-content-disposition are given,
> but if -N is given HEAD is always sent. this is wrong, as HEAD should be
> skipped even if -N and --no-content-disposition are given (no need to
> care about the deprecated -N -O combination). can't think of any other
> case in which HEAD should be skipped, though.

Cc'ing wget ML, as it's probably important to open up discussion of the
current logic.

What about the case when nothing is given on the command line except
- --no-content-disposition? What do we need HEAD for then?

Also: I don't believe HEAD should be sent if no options are given on the
command line. What purpose would that serve? If it's to find a possible
Content-Disposition header, we can get that (and more reliably) at GET
time (though, I believe we may currently be requiring the file name
before we fetch, which if true, should definitely be changed but not for
1.11, in which case the HEAD will be allowed for the time being); and
since we're not matching against potential accept/reject lists, we don't
really need it.

I think it really makes much more sense to enumerate those few cases
where we need to issue a HEAD, rather than try to determine all the
cases where we don't: if I have to choose a side to err on, I'd rather
not send HEAD in a case or two where we needed it, rather than send it
in a few where we didn't, as any request-response cycle eats up time. I
also believe that the cases where we want a HEAD are/should be fewer
than the cases where we don't want them.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlol+7M8hyUobTrERCOT0AJwNt2dm/80zL7UYbadBaiaPrMvSUQCePKmS
WO77ltxl0vr0Pcgd8H1bIY8=
=zCTU
-END PGP SIGNATURE-

[Fwd: Bug#281201: wget prints it's progress even when background]

2007-07-11 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The following bug was submitted to Debian's bug tracker.
I'm curious what people think about this suggestion.

Don't we already check for something like redirected output (and force
the progress indicator to dots)? It seems to me that if that is
appropriate, then a case could be made for this as well.

Perhaps instead of "shutting up", though, wget should attempt to direct
to a file? Perhaps with a "one last message" to the terminal (assuming
the terminal doesn't have TOSTOP set--it should ignore SIGTTOU and
handle EIO to handle that case), to indicate that it's doing this.

- -Micah


- ---- Original Message 
Subject: Bug#281201: wget prints it's progress even when background
Resent-Date: Tue, 10 Jul 2007 13:57:01 +,   Tue, 10 Jul 2007 13:57:02
+
Resent-From: Ilya Anfimov <[EMAIL PROTECTED]>
Resent-To: [EMAIL PROTECTED]
Resent-CC: Noèl Köthe <[EMAIL PROTECTED]>
Date: Tue, 10 Jul 2007 17:54:51 +0400
From: Ilya Anfimov <[EMAIL PROTECTED]>
Reply-To: Ilya Anfimov <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
To: Peter Eisentraut <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]


 My suggestion is to stop printing verbose progress messages
when the job is resumed in background. It could be checked
by (successful) getpgrp() not equal to (successful) tcgetprp(1)
in SIGCONT signal handler.
 And something like this is used in some console applications,
for example, in lftp.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlThP7M8hyUobTrERCA4sAJ0RwfVIsL5UcafLkfm5qihERnRNvQCeIABc
t+Y3FeNYctJsdPcPbTwYukk=
=eBSi
-END PGP SIGNATURE-

Re: Bug update notifications

2007-07-09 Thread Matthew Woehlke


Micah Cowan wrote:

Matthew Woehlke wrote:

Micah Cowan wrote:
...any reason to not CC bug updates here also/instead? That's how e.g.
kwrite does thing (also several other lists AFAIK), and seems to make
sense. This is 'bug-wget' after all :-).


It is; but it's also 'wget'.


Hmm, so it is; my bad :-).


While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the "main
discussion/support list" from the "bugs" list)?


I guess a common pattern is:
foo-help
foo-devel
foo-commits

...but of course you're the maintainer, it's your call :-).
(The above aren't necessarily "actual names" of course, just the 
categories it seems like I'm most used to seeing. e.g. the GNU 
convention is of course bug-foo, not foo-devel.)


--
Matthew
This .sig is false

Re: Bug update notifications

2007-07-09 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Matthew Woehlke wrote:
> Micah Cowan wrote:
>> The wget-notify mailing list
>> (http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
>> receiving notifications of bug updates from GNU Savannah, in addition to
>>  subversion commits.
> 
> ...any reason to not CC bug updates here also/instead? That's how e.g.
> kwrite does thing (also several other lists AFAIK), and seems to make
> sense. This is 'bug-wget' after all :-).

It is; but it's also 'wget'. While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the "main
discussion/support list" from the "bugs" list)?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkrpK7M8hyUobTrERCIMaAKCDG8JN7DmUK7oIuE0fYmgYnZIrlgCghK7n
iV8rIDYe1+cxzrQATM43CEM=
=PKqt
-END PGP SIGNATURE-

Re: Bug update notifications

2007-07-09 Thread Matthew Woehlke


Micah Cowan wrote:

The wget-notify mailing list
(http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
receiving notifications of bug updates from GNU Savannah, in addition to
 subversion commits.


...any reason to not CC bug updates here also/instead? That's how e.g. 
kwrite does thing (also several other lists AFAIK), and seems to make 
sense. This is 'bug-wget' after all :-).


--
Matthew
This .sig is false

Re: wget bug?

2007-07-09 Thread Matthias Vill


Mauro Tortonesi schrieb:

On Mon, 9 Jul 2007 15:06:52 +1200
[EMAIL PROTECTED] wrote:


wget under win2000/win XP
I get "No such file or directory" error messages when using the follwing 
command line.


wget -s --save-headers 
"http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1&class=Arc";

%1 = 212BI
Any ideas?


hi nikolaus,

in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. 
try using %1% instead of %1.



AFAIK it's ok to use %1, because it is a special case. Also the error 
would be a 404 or some wget error in that case the variable gets 
substituted in a wrong way or not? (actually even than you get a 200 
response with that url)


I just tried using the command inside a batch-file and came across 
another problem: You used a lowercase -s wich is not recognized by my 
wget-version, but a uppercase -S is. i guess you should change that.


I would guess wget is not in your PATH.
Try using "c:\path\to\the dircetory\wget.exe" instead of just wget.

If this too does not hel at explicit "--restrict-file-names=windows" to 
your options, so wget does not try to use the ? inside a filename. 
(normally not needed)


So a should-work-for-all-means-version is

"c:\path\wget.exe" -S --save-headers --restrict-file-names=windows 
"http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1&class=Arc";


Of course just one line, but my dump mail-editor wrapped it.

Greetings
Matthias

Re: wget bug?

2007-07-09 Thread Mauro Tortonesi

On Mon, 9 Jul 2007 15:06:52 +1200
[EMAIL PROTECTED] wrote:

> wget under win2000/win XP
> I get "No such file or directory" error messages when using the follwing 
> command line.
> 
> wget -s --save-headers 
> "http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1&class=Arc";
> 
> %1 = 212BI
> Any ideas?

hi nikolaus,

in windows, you're supposed to use %VARIABLE_NAME% for variable substitution. 
try using %1% instead of %1.

-- 
Mauro Tortonesi <[EMAIL PROTECTED]>

wget bug?

2007-07-08 Thread Nikolaus_Hermanspahn

wget under win2000/win XP
I get "No such file or directory" error messages when using the follwing 
command line.

wget -s --save-headers 
"http://www.nndc.bnl.gov/ensdf/browseds.jsp?nuc=%1&class=Arc";

%1 = 212BI
Any ideas?

thank you

Dr Nikolaus Hermanspahn
Advisor (Science)
National Radiation Laboratory
Ministry of Health
DDI: +64 3 366 5059
Fax: +64 3 366 1156

http://www.nrl.moh.govt.nz
mailto:[EMAIL PROTECTED]




Statement of confidentiality: This e-mail message and any accompanying
attachments may contain information that is IN-CONFIDENCE and subject to
legal privilege.
If you are not the intended recipient, do not read, use, disseminate,
distribute or copy this message or attachments.
If you have received this message in error, please notify the sender
immediately and delete this message.


*
This e-mail message has been scanned for Viruses and Content and cleared 
by the Ministry of Health's Content and Virus Filtering Gateway
*

Bug update notifications

2007-07-07 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

The wget-notify mailing list
(http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
receiving notifications of bug updates from GNU Savannah, in addition to
 subversion commits.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkG0Q7M8hyUobTrERCLVXAJwP7ru9v88PFF6PgREWTn0XF7XRnwCfY1hd
4W1KLuYYRvZ0pSXOLk6YY/Y=
=TOP4
-END PGP SIGNATURE-

Re: wget on gnu.org: Report a Bug

2007-07-07 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Micah Cowan wrote:
> Tony Lewis wrote:
>> The “Report a Bug” section of http://www.gnu.org/software/wget/ should
>> encourage submitters to send as much relevant information as possible
>> including wget version, operating system, and command line. The
>> submitter should also either send or at least save a copy of the --debug
>> output.
> 
> This information is currently in the bug submitting form at Savannah:
> https://savannah.gnu.org/bugs/?func=additem&group=wget
> 
> But should probably be duplicated at the website as well... let me know
> if the current text could use improvement.

I've copied the text to the website, along with a link to Simon Tatham's
essay on reporting bugs.

I also added a small section regarding the IRC #wget channel on FreeNode.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkDhh7M8hyUobTrERCDBQAJ4ln3eWsbdbsa5ahfB7kv5tHIc1wACeLSIj
uXkezPuzt7GMoiXvUemMT9U=
=2dVK
-END PGP SIGNATURE-

RE: wget on gnu.org: Report a Bug

2007-07-07 Thread Tony Lewis

Micah Cowan wrote:

> This information is currently in the bug submitting form at Savannah:

That looks good.

> I think perhaps such things as the wget version and operating system
> ought to be emitted by default anyway (except when -q is given).

I'm not convinced that wget should ordinarily emit the operating system. It's 
really only useful to someone other than the person running the command.

> Other than that, what kinds of things would --bug provide above and
> beyond --debug?

It should echo the command line and the contents of .wgetrc to the bug output, 
which even the --debug option does not do. Perhaps we will think of other 
things to include in the output if this option gets added.

However, the big difference would be where the output was directed. When 
invoked as:
wget ... --bug bug_report

all interesting (but sanitized) information would be written to the file 
bug_report whether or not the command included --debug, which would also direct 
the debugging output to STDOUT.

The main reason I had for suggesting this option is that it would be easy to 
tell newbies with problems to run the exact same command with "--bug 
bug_report" and send the file bug_report to the list (or to whomever is working 
on the problem). The user wouldn't see the command behave any differently, but 
we'd have the information we need to investigate the report.

It might even be that most of us would choose to run with --bug most of the 
time relying on the normal wget output except when something appears to have 
gone wrong and then checking the file when it does.

Tony

Re: wget on gnu.org: Report a Bug

2007-07-07 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Tony Lewis wrote:
> The “Report a Bug” section of http://www.gnu.org/software/wget/ should
> encourage submitters to send as much relevant information as possible
> including wget version, operating system, and command line. The
> submitter should also either send or at least save a copy of the --debug
> output.

This information is currently in the bug submitting form at Savannah:
https://savannah.gnu.org/bugs/?func=additem&group=wget

But should probably be duplicated at the website as well... let me know
if the current text could use improvement.

> Perhaps we need a --bug option for the command line that runs the
> command and saves important information in a file that can be submitted
> along with the bug report. The saved information would have to be
> sanitized to remove things like user IDs and passwords but could include
> things like the wget version, command line options, and what the command
> tried to do.

I think perhaps such things as the wget version and operating system
ought to be emitted by default anyway (except when -q is given).

Other than that, what kinds of things would --bug provide above and
beyond --debug?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGj+hk7M8hyUobTrERCHqtAJ9HTIFd3hOJ2R9aQBUqCtsvW2xJ1wCePOfo
67Olfti9HtI+1pYkNiCj7rc=
=/Rhd
-END PGP SIGNATURE-

wget on gnu.org: Report a Bug

2007-07-07 Thread Tony Lewis

The "Report a Bug" section of http://www.gnu.org/software/wget/ should
encourage submitters to send as much relevant information as possible
including wget version, operating system, and command line. The submitter
should also either send or at least save a copy of the --debug output.

 

Perhaps we need a --bug option for the command line that runs the command
and saves important information in a file that can be submitted along with
the bug report. The saved information would have to be sanitized to remove
things like user IDs and passwords but could include things like the wget
version, command line options, and what the command tried to do.

 

Tony

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-06 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Steven M. Schweda wrote:
>>From :
> 
>> [...]
>>char filecopy[2048];
>>if (file[0] != '"') {
>>  sprintf(filecopy, "\"%.2047s\"", file);
>>} else {
>>  strncpy(filecopy, file, 2047);
>>}
>> [...]
>> It should be:
>>
>>  sprintf(filecopy, "\"%.2045s\"", file);
>> [...]
> 
>I'll admit to being old and grumpy, but am I the only one who
> shudders when one small code segment contains "2048", "2047", and "2045"
> as separate, independent literal constants, instead of using a macro, or
> "sizeof", or something which would let the next fellow change one buffer
> size in one place, instead of hunting all over the code looking for
> every "20xx" which might be related?

Well, as already mentioned, aprintf() would be much more appropriate, as
it elminates the need for constants like these.

And yes, "magic numbers" drive me crazy, too. Of course, when used with
printf's 's' specifier, it needs special handling (crafting a STR()
macro or somesuch).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjxcX7M8hyUobTrERCHSAAJ9VkQdfhK4/LwByseYH2ZYVzoPqPwCePU1k
2Llybpq/oceXWMyZpBO4bPY=
=Vj/R
-END PGP SIGNATURE-

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-06 Thread Steven M. Schweda

>From :

> [...]
>char filecopy[2048];
>if (file[0] != '"') {
>  sprintf(filecopy, "\"%.2047s\"", file);
>} else {
>  strncpy(filecopy, file, 2047);
>}
> [...]
> It should be:
> 
>  sprintf(filecopy, "\"%.2045s\"", file);
> [...]

   I'll admit to being old and grumpy, but am I the only one who
shudders when one small code segment contains "2048", "2047", and "2045"
as separate, independent literal constants, instead of using a macro, or
"sizeof", or something which would let the next fellow change one buffer
size in one place, instead of hunting all over the code looking for
every "20xx" which might be related?

   Just a thought.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook


Thanks for the follow up.  :-)

On Jul 5, 2007, at 3:52 PM, Micah Cowan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:

So forgive me for a newbie-never-even-lurked kind of question:  will
this fix make it into wget for other users (and for me in the  
future)?

Or do I need to do more to make that happen, or...?  Thanks!


Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some  
point,
but I wouldn't expect it to come out in the next release (which,  
itself,
will not be arriving for a couple months); it will probably go into  
wget

1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-


--
✐"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook

925-784-3077
--
✐

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Bruso, John wrote:
> Please remove me from this list. thanks,

Nobody on this list has the ability to do this, unfortunately (Wget
maintainership is separate from the maintainers of this list). To
further confuse the issue, [EMAIL PROTECTED] is actually just an alias to
wget@sunsite.dk, which is the one you're actually subscribed to.

To unsubscribe, send an email to [EMAIL PROTECTED]; it will
send you a confirmation email that you'll need to reply to before you'll
actually be unsubscribed.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXaB7M8hyUobTrERCPluAJ96Uig9uMkHSeA8G5iRDPT2HDtaEQCffeN/
s+U0CnIY5oHYXWSwa6HXVBg=
=0gDG
-END PGP SIGNATURE-

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Micah Cowan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
> So forgive me for a newbie-never-even-lurked kind of question:  will
> this fix make it into wget for other users (and for me in the future)? 
> Or do I need to do more to make that happen, or...?  Thanks!

Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some point,
but I wouldn't expect it to come out in the next release (which, itself,
will not be arriving for a couple months); it will probably go into wget
1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook

So forgive me for a newbie-never-even-lurked kind of question:  will  
this fix make it into wget for other users (and for me in the  
future)?  Or do I need to do more to make that happen, or...?  Thanks!


On Jul 5, 2007, at 12:52 PM, Hrvoje Niksic wrote:


Rich Cook <[EMAIL PROTECTED]> writes:


On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook <[EMAIL PROTECTED]> writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  "Freshly allocated with malloc" in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that
didn't show up in my man pages, so I punted.  Sorry.


No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).


--
✐"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook

925-784-3077
--
✐

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic

Rich Cook <[EMAIL PROTECTED]> writes:

> On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:
>
>> Rich Cook <[EMAIL PROTECTED]> writes:
>>
>>> Trouble is, it's undocumented as to how to free the resulting
>>> string.  Do I call free on it?
>>
>> Yes.  "Freshly allocated with malloc" in the function documentation
>> was supposed to indicate how to free the string.
>
> Oh, I looked in the source and there was this xmalloc thing that
> didn't show up in my man pages, so I punted.  Sorry.

No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).

RE: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Bruso, John

Please remove me from this list. thanks,
 
John Bruso



From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Thu 7/5/2007 12:30 PM
To: Hrvoje Niksic
Cc: Tony Lewis; [EMAIL PROTECTED]
Subject: Re: bug and "patch": blank spaces in filenames causes looping




On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:

> Rich Cook <[EMAIL PROTECTED]> writes:
>
>> Trouble is, it's undocumented as to how to free the resulting
>> string.  Do I call free on it?
>
> Yes.  "Freshly allocated with malloc" in the function documentation
> was supposed to indicate how to free the string.

Oh, I looked in the source and there was this xmalloc thing that 
didn't show up in my man pages, so I punted.  Sorry.

--
?"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook
<http://5pmharmony.com <http://5pmharmony.com/> >
925-784-3077
--
?

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook



On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook <[EMAIL PROTECTED]> writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  "Freshly allocated with malloc" in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that  
didn't show up in my man pages, so I punted.  Sorry.


--
✐"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook

925-784-3077
--
✐

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic

Rich Cook <[EMAIL PROTECTED]> writes:

> Trouble is, it's undocumented as to how to free the resulting
> string.  Do I call free on it?

Yes.  "Freshly allocated with malloc" in the function documentation
was supposed to indicate how to free the string.

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic

"Virden, Larry W." <[EMAIL PROTECTED]> writes:

> "Tony Lewis" <[EMAIL PROTECTED]> writes:
>
>> Wget has an `aprintf' utility function that allocates the result on
> the heap.  Avoids both buffer overruns and 
>> arbitrary limits on file name length.
>
> If it uses the heap, then doesn't that open a hole where a particularly
> long file name would overflow the heap?

No, aprintf tries to allocate as much memory as necessary.  If the
memory is unavailable, malloc returns NULL and Wget exits.

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook

Trouble is, it's undocumented as to how to free the resulting  
string.  Do I call free on it?  I'd use asprintf, but I'm afraid to  
suggest that here as it may not be portable.


On Jul 5, 2007, at 10:45 AM, Hrvoje Niksic wrote:


"Tony Lewis" <[EMAIL PROTECTED]> writes:

There is a buffer overflow in the following line of the proposed  
code:


 sprintf(filecopy, "\"%.2047s\"", file);


Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.


--
Rich "wealthychef" Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.

RE: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Virden, Larry W.

 


-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] 

"Tony Lewis" <[EMAIL PROTECTED]> writes:

> Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and 
> arbitrary limits on file name length.

If it uses the heap, then doesn't that open a hole where a particularly
long file name would overflow the heap?

-- 
http://wiki.tcl.tk/ >
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
mailto:[EMAIL PROTECTED] > http://www.purl.org/NET/lvirden/
>

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic

"Tony Lewis" <[EMAIL PROTECTED]> writes:

> There is a buffer overflow in the following line of the proposed code:
>
>  sprintf(filecopy, "\"%.2047s\"", file);

Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.

Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook

Good point, although it's only a POTENTIAL buffer overflow, and it's  
limited to 2 bytes, so at least it's not exploitable.  :-)



On Jul 5, 2007, at 9:05 AM, Tony Lewis wrote:


There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, "\"%.2047s\"", file);

It should be:

 sprintf(filecopy, "\"%.2045s\"", file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and "patch": blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the
remote copy of the file is newer than the local, then wget gets
thrown into a loop of "No such file or directory" endlessly.   I have
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request ("RETR", file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '"') {
 sprintf(filecopy, "\"%.2047s\"", file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request ("RETR", filecopy);






--
Rich "wealthychef" Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets
better all the time.


--
Rich "wealthychef" Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.

RE: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Tony Lewis

There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, "\"%.2047s\"", file);

It should be:

 sprintf(filecopy, "\"%.2045s\"", file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and "patch": blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the  
remote copy of the file is newer than the local, then wget gets  
thrown into a loop of "No such file or directory" endlessly.   I have  
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it  
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request ("RETR", file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '"') {
 sprintf(filecopy, "\"%.2047s\"", file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request ("RETR", filecopy);






--
Rich "wealthychef" Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets  
better all the time.

bug and "patch": blank spaces in filenames causes looping

2007-07-04 Thread Rich Cook

On OS X, if a filename on the FTP server contains spaces, and the  
remote copy of the file is newer than the local, then wget gets  
thrown into a loop of "No such file or directory" endlessly.   I have  
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it  
should be clear.


==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
  char *request, *respline;
  int nwritten;
  uerr_t err;

  /* Send RETR request.  */
  request = ftp_request ("RETR", file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
  char *request, *respline;
  int nwritten;
  uerr_t err;
  char filecopy[2048];
  if (file[0] != '"') {
sprintf(filecopy, "\"%.2047s\"", file);
  } else {
strncpy(filecopy, file, 2047);
  }

  /* Send RETR request.  */
  request = ftp_request ("RETR", filecopy);






--
Rich "wealthychef" Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.

Re: wget -P bug

2007-06-21 Thread Mauro Tortonesi

On Wed, 20 Jun 2007 11:01:40 -0300
"Itamar Reis Peixoto" <[EMAIL PROTECTED]> wrote:

> can you fix this bug for me ?
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229744

hi itamar,

i've just tried both the current (http://svn.dotsrc.org/repo/wget/) and 1.10.2 
versions of wget, and the -P option seems to work correctly. so, this might be 
a fedora-specific problem.

please, let's continue this conversation on the wget ml.

-- 
Mauro Tortonesi <[EMAIL PROTECTED]>

Bug in the generated manpage

2007-06-12 Thread Stepan Kasal

Hello,

using Wget 1.10.2 I noticed that the man page description for
--no-proxy says:

For more information about the use of proxies with Wget,

... and that's all.  The original contains an @xref, which gets
swallowed by texi2pod.

I don't know how/if it should be repaired, but I thought it's worth
reporting.

Have a nice day,
Stepan

Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill

Matthias Vill schrieb:
> Mario Ander schrieb:
>> Hi everybody,
>>
>> I think there is a bug storing cookies with wget.
>>
>> See this command line:
>>
>> "C:\Programme\wget\wget" --user-agent="Opera/8.5 (X11;
>> U; en)" --no-check-certificate --keep-session-cookies
>> --save-cookies="cookie.txt" --output-document=-
>> --debug --output-file="debug.txt"
>> --post-data="name=xxx&password=&dummy=Internetkennwort&login.x=0&login.y=0"
>> "https://www.vodafone.de/proxy42/portal/login.po";
> [..]
>> Set-Cookie:
>> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
>> path=/jsp 
>> Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
>> expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
>> Set-Cookie:
>> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
>> path=/proxy42
> [..]
>> ---response end---
>> 200 OK
>> Attempt to fake the path: /jsp,
>> /proxy42/portal/login.po
> 
> So the problem seems to be that wget rejects cookies for paths which
> don't "fit" to the request url. Like the script you call is in
> /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
> those cookies, but wich is not related to /jsp
> 
> So it seems to be wget sticking to the strict RFC and the script doing
> wrong.
> To get this working you would need to patch wget for not RFC-compliant
> cookies maybe along with an "--accept-malformed-cookies" directiv.
> 
> Hope this helps you
> 
> Matthias
> 

So I thought of a second solution: If you have cygwin (or at least
bash+grep) you can run this small script to dublicate and truncate the
cookie.
--- CUT here ---
#!/bin/bash
#Author: Matthias Vill; feel free to change and use

#get the line for proxy42-path in $temp
temp=$(grep proxy42 cookies.txt)

#remove everything after last !
temp=${temp%!*}

#replace proxy42 by jsp
temp=${temp/proxy42/jsp}

#append newline to file
#echo >>cookies.txt

#add new cookie to cookies.txt
echo $temp>>cookies.txt
--- CUT here ---
Maybe you need to remove the "#" in front of "echo >>cookies.txt" to
compensate a missing trailing newline; otherwise you may end up changing
the value of the previous cookie.

Maybe this helps even more

Matthias

Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill

Mario Ander schrieb:
> Hi everybody,
> 
> I think there is a bug storing cookies with wget.
> 
> See this command line:
> 
> "C:\Programme\wget\wget" --user-agent="Opera/8.5 (X11;
> U; en)" --no-check-certificate --keep-session-cookies
> --save-cookies="cookie.txt" --output-document=-
> --debug --output-file="debug.txt"
> --post-data="name=xxx&password=&dummy=Internetkennwort&login.x=0&login.y=0"
> "https://www.vodafone.de/proxy42/portal/login.po";
[..]
> Set-Cookie:
> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
> path=/jsp 
> Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
> expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
> Set-Cookie:
> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
> path=/proxy42
[..]
> ---response end---
> 200 OK
> Attempt to fake the path: /jsp,
> /proxy42/portal/login.po

So the problem seems to be that wget rejects cookies for paths which
don't "fit" to the request url. Like the script you call is in
/proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
those cookies, but wich is not related to /jsp

So it seems to be wget sticking to the strict RFC and the script doing
wrong.
To get this working you would need to patch wget for not RFC-compliant
cookies maybe along with an "--accept-malformed-cookies" directiv.

Hope this helps you

Matthias

bug storing cookies with wget

2007-06-01 Thread Mario Ander

Hi everybody,

I think there is a bug storing cookies with wget.

See this command line:

"C:\Programme\wget\wget" --user-agent="Opera/8.5 (X11;
U; en)" --no-check-certificate --keep-session-cookies
--save-cookies="cookie.txt" --output-document=-
--debug --output-file="debug.txt"
--post-data="name=xxx&password=&dummy=Internetkennwort&login.x=0&login.y=0"
"https://www.vodafone.de/proxy42/portal/login.po";



wget answer this way:



DEBUG output created by Wget 1.10.2 on Windows.

--15:41:58-- 
https://www.vodafone.de/proxy42/portal/login.po
   => `-'
Resolving www.vodafone.de... seconds 0.00,
139.7.147.41
Caching www.vodafone.de => 139.7.147.41
Connecting to www.vodafone.de|139.7.147.41|:443...
seconds 0.00, connected.
Created socket 1844.
Releasing 0x003a5a90 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 1844 to SSL
handle 0x00931758
certificate:
  subject: /C=DE/ST=NRW/L=Duesseldorf/O=Vodafone D2
GmbH/OU=TOP-A/OU=Terms of use at www.verisign.com/rpa
(c)00/CN=www.vodafone.de
  issuer:  /O=VeriSign Trust Network/OU=VeriSign,
Inc./OU=VeriSign International Server CA - Class
3/OU=www.verisign.com/CPS Incorp.by Ref. LIABILITY
LTD.(c)97 VeriSign
WARNING: Certificate verification error for
www.vodafone.de: unable to get local issuer
certificate

---request begin---
POST /proxy42/portal/login.po HTTP/1.0 
User-Agent: Opera/8.5 (X11; U; en) 
Accept: */* 
Host: www.vodafone.de 
Connection: Keep-Alive 
Content-Type: application/x-www-form-urlencoded 
Content-Length: 77 
 
---request end---
[POST data:
name=xxx&password=&dummy=Internetkennwort&login.x=0&login.y=0]
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 200 OK 
Date: Fri, 01 Jun 2007 13:41:56 GMT 
Server: Apache 
Pragma: No-cache 
Expires: Thu, 01 Jan 1970 00:00:00 GMT 
Set-Cookie:
JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
path=/jsp 
Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
Set-Cookie:
JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
path=/proxy42 
Cache-Control: no-cache,no-store,max-age=0 
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTR STP IND DEM"

Connection: close 
Content-Type: text/html; charset=ISO-8859-1 
Via: 1.1 www.vodafone.de (Alteon iSD-SSL/6.0.5) 
 
---response end---
200 OK
Attempt to fake the path: /jsp,
/proxy42/portal/login.po
cdm: 1 2 3 4 5 6 7 8
Stored cookie vodafone.de -1 (ANY) / 
 [expiry 2007-06-01 17:05:16] VODAFONELOGIN
1

Stored cookie www.vodafone.de -1 (ANY) /proxy42
  [expiry none] JSESSIONID
GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338
Length: unspecified [text/html]

0K .. .. .. ...   
338.67 KB/s

Closed 1844/SSL 0x931758
15:41:58 (338.67 KB/s) - `-' saved [34644]

Saving cookies to cookie.txt.
Done saving cookies.




The cookie.txt looks this way:



# HTTP cookie file.
# Generated by Wget on 2007-06-01 15:33:23.
# Edit at your own risk.

www.vodafone.de FALSE   /proxy42FALSE   0   JSESSIONID
GggBMfxV9vGqGwtyQGJFXsyCr6vQvGSh9KGgDt7xgLycdc5MTQps!1467361027!NONE!1180704801023
.vodafone.deTRUE/   FALSE   1180709801  VODAFONELOGIN   1



and should look like this (but does not):


# HTTP cookie file.
# Generated by Wget on 2007-06-01 15:47:31.
# Edit at your own risk.

www.vodafone.de FALSE   /proxy42FALSE   0   JSESSIONID
GgjRT1NTfspwH1cJCVPlGv37c4JKgkTDPYJNsTM2l1RJG0CJQ8Rp!-249032648!NONE!1180705649205
www.vodafone.de FALSE   /jspFALSE   0   JSESSIONID
GgjRT1NTfspwH1cJCVPlGv37c4JKgkTDPYJNsTM2l1RJG0CJQ8Rp!-249032648!NONE
.vodafone.deTRUE/   FALSE   1180710649  VODAFONELOGIN   1


Thats all.
Bye.




   

Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for 
today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow

possible bug in wget-1.10.2 and earlier

2007-05-30 Thread Harrington, Paul

Hi,
wget appears to be confused by FTP servers that only have one space
between the file-size information. We only came across this problem
today so I don't know how common it is.
 
pjjH
 




From: Harrington, Paul 
Sent: Thursday, May 31, 2007 12:06 AM
To:   
Subject: RE: File issue using WGET


Your FTP server must have changed the output of the listing format or,
more precisely, the string representation of some of the components has
changed such that only one space separates the group name from the
file-size. The bug is, of course, with wget but it is one that hitherto
had not been observed when interacting with your FTP server.
 
 
pjjH
 
 
 
[EMAIL PROTECTED] diff -u ftp-ls.c  ~/tmp
--- ftp-ls.c2005-08-04 17:52:33.0 -0400
+++ /u/harringp/tmp/ftp-ls.c2007-05-31 00:02:07.209955000 -0400
@@ -229,6 +229,18 @@
  break;
}
  errno = 0;
+  /* after the while loop terminates, t may not always
+ point to a space character. In the case when
+ there is only one-space between the user/group
+ information and the file-size, the space will
+ have been overwritten by a \0 via strok().  So,
+ if you have been through the loop at least once,
+ advance forward one chacter.
+  */
+
+  if (t < ptok)
+  t++;
+
  size = str_to_wgint (t, NULL, 10);
  if (size == WGINT_MAX && errno == ERANGE)
/* Out of range -- ignore the size.   Should

RE: wget bug

2007-05-24 Thread Tony Lewis

Highlord Ares wrote:

 

> it tries to download web pages named similar to

>  
http://site.com?variable=yes&mode=awesome

 

Since "&" is a reserved character in many command shells, you need to quote
the URL on the command line:

 

wget " 
http://site.com?variable=yes&mode=awesome";

 

Tony

RE: wget bug

2007-05-23 Thread Willener, Pat

This does not look like a valid URL to me - shouldn't there be a slash at the 
end of the domain name?
 
Also, when talking about a bug (or anything else), it is always helpful if you 
specify the wget version (number).



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Highlord Ares
Sent: Thursday, May 24, 2007 11:41
To: [EMAIL PROTECTED]
Subject: wget bug


when I run wget on a certain sites, it tries to download web pages named 
similar to http://site.com?variable=yes&mode=awesome.  However, wget isn't 
saving any of these files, no doubt because of some file naming issue?  this 
problem exists in both the Windows & unix versions. 

hope this helps

wget bug

2007-05-23 Thread Highlord Ares


when I run wget on a certain sites, it tries to download web pages named
similar to http://site.com?variable=yes&mode=awesome.  However, wget isn't
saving any of these files, no doubt because of some file naming issue?  this
problem exists in both the Windows & unix versions.

hope this helps

Re: Bug using recursive get and stdout

2007-04-17 Thread Steven M. Schweda

   A quick search at "http://www.mail-archive.com/wget@sunsite.dk/"; for
"-O" found:

  http://www.mail-archive.com/wget@sunsite.dk/msg08746.html
  http://www.mail-archive.com/wget@sunsite.dk/msg08748.html

   The way "-O" is implemented, there are all kinds of things which are
incompatible with it, "-r" among them.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547

Bug using recursive get and stdout

2007-04-17 Thread Jonathan A. Zdziarski


Greetings,

Stumbled across a bug yesterday reproduced in both v1.8.2 and 1.10.2.

Apparently, recursive get tries to open the file for reading after  
downloading, to download subsequent files. Problem is, when used with  
-O - to deliver to stdout, it cannot open that file, so you get the  
output below (note the "No such file or directory error"). In 1.10,  
it appears that they removed this error message, but wget still fails  
to recursively fetch.


I realize it seems like there wouldn't be much reason to send more  
than one page to stdout, but I'm feeding it all into a statistical  
filter to classify website data, so it doesn't really matter to the  
filter. Do you know of any workaround for this, other than opening  
the files after reading (won't scale with thousands per minute).


Thanks!

$ wget -O - -r http://www.zdziarski.com > out
--15:40:06--  http://www.zdziarski.com/
   => `-'
Resolving www.zdziarski.com... done.
Connecting to www.zdziarski.com[209.51.159.242]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24,275 [text/html]

100%[>] 24,275   163.49K/s 
ETA 00:00


15:40:06 (163.49 KB/s) - `-' saved [24275/24275]

www.zdziarski.com/index.html: No such file or directory

FINISHED --15:40:06--
Downloaded: 24,275 bytes in 1 files





Jonathan

Re: FW: think you have a bug in CSS processing

2007-04-13 Thread J.F. Groff

Oh wait. Somebody already did the patch!

http://www.mail-archive.com/[EMAIL PROTECTED]/msg09502.html
http://article.gmane.org/gmane.comp.web.wget.patches/1867

I guess it's up to maintainers to decide whether to include this in
the standard wget distribution. In the meantime, hearty thanks to Ted
Mielczarek, you made my day!

 JFG

On 4/13/07, J.F. Groff <[EMAIL PROTECTED]> wrote:

Hi Tony,

> > Amazingly I found this feature request in a 2003 message to this very
> mailing
> > list. Are there only a few lunatics like me who think this should be
> included?
>
> Wget is written and maintained by volunteers. What you need to find is a
> lunatic willing to volunteer to write the code to support this feature
> request.

Heh, sure ! I'm lunatic enough to try... Fetching the code from svn as
I write this. But the docs page says:

At the moment the GNU Wget development tree has been split in two
branches in order to allow bugfixing releases of the feature-frozen
1.10.x tree while continuing the development for Wget 2.0 on the main
branch.

Anywhere I can look at planned features for the 2.0 branch? There's an
awful lot of items in the project's TODO list but no mention of CSS.
Shall I just add the feature request to the TODO first, or is there a
community process involved in picking candidate features?

Cheers,

  JFG

Re: FW: think you have a bug in CSS processing

2007-04-13 Thread J.F. Groff


Hi Tony,


> Amazingly I found this feature request in a 2003 message to this very
mailing
> list. Are there only a few lunatics like me who think this should be
included?

Wget is written and maintained by volunteers. What you need to find is a
lunatic willing to volunteer to write the code to support this feature
request.


Heh, sure ! I'm lunatic enough to try... Fetching the code from svn as
I write this. But the docs page says:

At the moment the GNU Wget development tree has been split in two
branches in order to allow bugfixing releases of the feature-frozen
1.10.x tree while continuing the development for Wget 2.0 on the main
branch.

Anywhere I can look at planned features for the 2.0 branch? There's an
awful lot of items in the project's TODO list but no mention of CSS.
Shall I just add the feature request to the TODO first, or is there a
community process involved in picking candidate features?

Cheers,

 JFG

RE: FW: think you have a bug in CSS processing

2007-04-13 Thread Tony Lewis

J.F.Groff wrote:

> Amazingly I found this feature request in a 2003 message to this very
mailing
> list. Are there only a few lunatics like me who think this should be
included?

Wget is written and maintained by volunteers. What you need to find is a
lunatic willing to volunteer to write the code to support this feature
request.

Tony

Re: FW: think you have a bug in CSS processing

2007-04-13 Thread J . F . Groff

Neil wrote:
> When giving it some thought I think a
> valid argument could be made that the string in the CSS document is not 
> exactly
> an URL but it is certainly URL-like.

The "URL-like strings" in CSS are actually standard URLs, either absolute or
relative, so they shouldn't be a big deal to handle. A caveat for the parser:
they can be quoted or unquoted and still work.
See http://www.w3.org/TR/CSS21/syndata.html#uri

Amazingly I found this feature request in a 2003 message to this very mailing
list. Are there only a few lunatics like me who think this should be included?

Cheers,

  JFG

Bug-report: wget with multiple cnames in ssl certificate

2007-04-12 Thread Alex Antener

Hi

If i connect with wget 1.10.2 (Debian Etch & Ubuntu Feisty Fawn) to a
secure host, that uses multiple cnames in the certificate i get the
following error:

[EMAIL PROTECTED]:~$ wget https://host.domain.tld
--10:18:55--  https://host.domain.tld/
   => `index.html'
Resolving host.domain.tld... xxx.xxx.xxx.xxx
Connecting to host.domain.tld|xxx.xxx.xxx.xxx|:443... connected.
ERROR: certificate common name `host0.domain.tld' doesn't match
requested host name `host.domain.tld'.
To connect to host.domain.tld insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

If I do the same with wget 1.9.1 (Debian Sarge) I do not get that Error.

Kind regards, Alex Antener

-- 
Alex Antener
Dipl. Medienkuenstler FH

[EMAIL PROTECTED] // http://lix.cc // +41 (0)44 586 97 63
GPG Key: 1024D/14D3C7A1 https://lix.cc/gpg_key.php
Fingerprint: BAB6 E61B 17D7 A9C9 6313  5141 3A3C DAA3 14D3 C7A1

FW: think you have a bug in CSS processing

2007-03-31 Thread Tony Lewis

From: Neil Smithline [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 31, 2007 9:44 PM
To: Tony Lewis
Subject: Re: think you have a bug in CSS processing

Oh - well if you don't support CSS processing then I guess I am making a new
request. I'm also suggesting a clarification to your documentation so that
this is clear.

-r --recursive

Recursive web-suck. According to the protocol of the URL, this can mean two
things. Recursive retrieval of a HTTP URL means that Wget will download the
URL you want, parse it as an HTML document (if an HTML document it is), and
retrieve the files this document is referring to, down to a certain depth
(default 5; change it with -l). Wget will create a hierarchy of directories
locally, corresponding to the one found on the HTTP server. 

At least at first glance , this seems to mean that the URL in the CSS
portion should be translated and downloaded. When giving it some thought I
think a valid argument could be made that the string in the CSS document is
not exactly an URL but it is certainly URL-like. I think there should be
some explicit documentation stating what is not covered. 

- Neil

On 3/31/07, Tony Lewis <[EMAIL PROTECTED]> wrote:

 <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED] wrote:

> I think I found a bug in CSS processing.

I think you're making a new feature request (and one that we've seen before)
to ADD processing for CSS.

Tony

think you have a bug in CSS processing

2007-03-30 Thread Neil Smithline


I think I found a bug in CSS processing. This was auto-generated and I'm far
from a CSS expert (quite the opposite). But, as far as I can tell (see
snippet below), it is supposed to be loaded from a directory named "-" that
is off of the main URL. For example, if the origination site is
http://www.foo.com, the GIF will be at
http://www.foo.com/-/includes/styles/swirl/skin_swirl_grey_top.gif. The
below text is came from the converted HTML file on the destination site.
You'll notice that the URL was not converted to an absolute URL pointing to
www.foo.com but neither was the GIF copied to the destination site. I've
done a find and it is nowhere to be found.

This really isn't a big deal for me as it is only one file and I've just
manually copied it over, but it does seem to be a bug worthy of fixing. If
you need more data, you can look at www.smithline.net. The snippet comes
from that page which was created using google page creator (don't ask me why
- it is definitely far from being ready for prime time) and then wget'ed
over to smithline.net.

Feel free to ping me should you need more info - Neil

PATH="/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/home/neils/bin" wget
--mirror --force-html --convert-links --no-parent
--directory-prefix=/home/neils/smithline.net/data --quiet --recursive
--no-host-directories http://www.smithline.net-a.googlepages.com


   #container {
  padding: 0px;
  background:URL("/-/includes/style/swirl/skin_swirl_grey_top.gif")
no-repeat top left;
  background-color:#dfdfdf;
  margin:0px auto;
   }

Re: wget-1.10.2 pwd/cd bug

2007-03-26 Thread Hrvoje Niksic

Hrvoje Niksic <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] (Steven M. Schweda) writes:
>
>>It's starting to look like a consensus.  A Google search for:
>> wget DONE_CWD
>> finds:
>>
>>   http://www.mail-archive.com/wget@sunsite.dk/msg08741.html
>
> That bug is fixed in subversion, revision 2194.

I forgot to add that this means that the patch can be retrieved with
`svn diff -r2193:2194' in Wget's source tree.  If you don't have a
checkout handy, Subversion still allows you to generate a diff using
`svn diff -r2193:2194 http://svn.dotsrc.org/repo/wget/trunk/'.

Also note that the fix is also available on the stable branch, and I
urge the distributors to apply it to their versions until 1.10.3 or
1.11 is released.

Re: wget-1.10.2 pwd/cd bug

2007-03-25 Thread Hrvoje Niksic

[EMAIL PROTECTED] (Steven M. Schweda) writes:

>It's starting to look like a consensus.  A Google search for:
> wget DONE_CWD
> finds:
>
>   http://www.mail-archive.com/wget@sunsite.dk/msg08741.html

That bug is fixed in subversion, revision 2194.

wget-1.10.2 pwd/cd bug

2007-03-24 Thread Steven M. Schweda

   It's starting to look like a consensus.  A Google search for:
wget DONE_CWD
finds:

  http://www.mail-archive.com/wget@sunsite.dk/msg08741.html



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547

wget-1.10.2 pwd/cd bug

2007-03-24 Thread Jason Mancini


This is inverted in ftp.c:

 if (con->csock != -1)
con->st &= ~DONE_CWD;
 else
con->st |= DONE_CWD;

If not error, request cwd?
If error, cwd done?

It's backwards.  Changing != to == solves the bug.
Thanks!
Jason

_
5.5%* 30 year fixed mortgage rate. Good credit refinance. Up to 5 free 
quotes - *Terms 
https://www2.nextag.com/goto.jsp?product=10035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h2a5d&s=4056&p=5117&disc=y&vers=910

wget-1.10.2 pwd/cd bug

2007-03-24 Thread Jason Mancini


I downloaded 1.10.2 source code.
u->cmd goes from 0x1B to 0x19, dropping DO_CMD on the second call
to ftp.c:getftp() after connection failure.  I'm trying to debug "THE loop".
Jason

_
Watch free concerts with Pink, Rod Stewart, Oasis and more. Visit MSN 
Presents today. 
http://music.msn.com/presents?icid=ncmsnpresentstagline&ocid=T002MSN03A07001

wget-1.10.2-5mdv2007.1 pwd/cd bug

2007-03-24 Thread Jason Mancini


Hello,
If wget cannot connect to the FTP server the first time,
it fails to CD properly after checking the path with PWD.
Here is a -d listing when connecting after failing.  Thanks!
Jason

   $cmd = "wget -d --limit-rate=999k --tries=0 --no-remove-listing -N 
$ftp/*.rpm";



--11:06:12--  
ftp://ftp:[EMAIL PROTECTED]/pub/linux/distributions/mandrivalinux/devel/cooker/i586/media/main/release/*.rpm

 (try: 2) => `.listing'
Found carroll.aset.psu.edu in host_name_addresses_map (0x808bf98)
Connecting to carroll.aset.psu.edu|128.118.2.96|:21... connected.
Created socket 3.
Releasing 0x0808bf98 (new refcount 1).
Logging in as ftp ...

220- snip big login message

--> USER ftp

331 Please specify the password.

--> PASS [EMAIL PROTECTED]

230 Login successful.
Logged in!
==> SYST ...
--> SYST

215 UNIX Type: L8
done.==> PWD ...
--> PWD

257 "/"
done.
==> TYPE I ...
--> TYPE I

200 Switching to Binary mode.
done.  ==> CWD not required.
conaddr is: 128.118.2.96
==> PASV ...
--> PASV

227 Entering Passive Mode (128,118,2,96,184,134)
trying to connect to 128.118.2.96 port 47238
Created socket 4.
done.==> LIST ...
--> LIST

150 Here comes the directory listing.
done.

   [ <=>   ] 331   --.--K/s

Closed fd 4
226 Directory send OK.
11:11:23 (412.30 KB/s) - `.listing' saved [331]

DIRECTORY; perms 700; month: Sep; day: 8; year: 2005 (no tm);
DIRECTORY; perms 700; month: Sep; day: 23; year: 2005 (no tm);
DIRECTORY; perms 755; month: May; day: 24; year: 2006 (no tm);
PLAINFILE; perms 644; month: Sep; day: 9; year: 2005 (no tm);
PLAINFILE; perms 644; month: Sep; day: 9; year: 2005 (no tm);
No matches on pattern `*.rpm'.
Closed fd 3

_
Get a FREE Web site, company branded e-mail and more from Microsoft Office 
Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/

1 2 3 4 5 6 7 8 9 >

1 - 100 of 879 matches

Mail list logo