Re: [BUG:#20329] If-Modified-Since support

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
 We need to give out the time stamp the local file in the Request
 header for that we need to pass on the local file's time stamp from
 http_loop() to get_http() . The only way to pass on this without
 altering the signature of the function is to add a field to struct url
 in url.h
 
 Could we go for it?

That is acceptable.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1
AFkIYSyyyS4egbyXjzBLXBo=
=fIT5
-END PGP SIGNATURE-


Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, that's what it means.

I'm not yet committed to doing this. I'd like to see first how many
mainstream servers will respect If-Modified-Since when given as part of
an HTTP/1.0 request (in comparison to how they respond when it's part of
an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not
in HTTP/1.1, that'd be an excellent case for holding off until we're
doing HTTP/1.1 requests.

Also, I don't think removing the previous HEAD request code is
entirely accurate: we probably would want to detect when a server is
feeding us non-new content in response to If-Modified-Since, and adjust
to use the current HEAD method instead as a fallback.

- -Micah

vinothkumar raman wrote:
 This mean we should remove the previous HEAD request code and use
 If-Modified-Since by default and have it to handle all the request and
 store pages if it is not returning a 304 response
 
 Is it so?
 
 
 On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 Follow-up Comment #4, bug #20329 (project wget):

 verbatim-mode's not all that readable.

 The gist is, we should go ahead and use If-Modified-Since, perhaps even now
 before there's true HTTP/1.1 support (provided it works in a reasonable
 percentage of cases); and just ensure that any Last-Modified header is sane.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2
8JiRBKtEhmcK3schVVO347A=
=yCJV
-END PGP SIGNATURE-


Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-01 Thread vinothkumar raman
This mean we should remove the previous HEAD request code and use
If-Modified-Since by default and have it to handle all the request and
store pages if it is not returning a 304 response

Is it so?


On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 Follow-up Comment #4, bug #20329 (project wget):

 verbatim-mode's not all that readable.

 The gist is, we should go ahead and use If-Modified-Since, perhaps even now
 before there's true HTTP/1.1 support (provided it works in a reasonable
 percentage of cases); and just ensure that any Last-Modified header is sane.

___

 Reply to this item at:

  http://savannah.gnu.org/bugs/?20329

 ___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: bug in wget

2008-06-14 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sir Vision wrote:
 Hello,
 
 enterring following command results in an error:
 
 --- command start ---
 c:\Downloads\wget_v1.11.3bwget
 ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/;
 -P c:\Downloads\
 --- command end ---
 
 wget cant convert .listing-file into a html-file

As this seems to work fine on Unix, for me, I'll have to leave it to the
Windows porting guy (hi Chris!) to find out what might be going wrong.

...however, it would really help if you would supply the full output you
got, from wget, that leads you to believe Wget couldn't do this
conversion. in fact, it wouldn't hurt to supply the -d flag as well, for
maximum debugging messages.

- --
Cheers,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B
dz38DW8jMMZtUxc+FhvIhfI=
=T+mK
-END PGP SIGNATURE-


Re: Bug

2008-03-03 Thread Mark Pors
ok, thanks for your reply
We have a work-around in place now, but it doesnt scale very good.
Anyways, I'll start looking for another solution

Thanks!
Mark


On Sat, Mar 1, 2008 at 10:15 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1



  Mark Pors wrote:
   Hi,
  
   I posted this bug over two years ago:
   http://marc.info/?l=wgetm=113252747105716w=4
  From the release notes I see that this is still not resolved. Are
   there any plans to fix this any time soon?

  I'm not sure that's a bug. It's more of an architectural choice.

  Wget currently works by downloading a file, then, if it needs to look
  for links in that file, it will open it and scan through it. Obviously,
  it can't do that when you use -O -.

  There are plans to move Wget to a more stream-like process, where it
  scans links during download. At such time, it's very possible that -p
  will work the way you want it to. In the meantime, though, it doesn't.

  - --
  Micah J. Cowan
  Programmer, musician, typesetting enthusiast, gamer...
  http://micah.cowan.name/
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.6 (GNU/Linux)
  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

  iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9
  u646lF2Qp0abOw3iuvD0ohg=
  =Cix9
  -END PGP SIGNATURE-



Re: bug on wget

2007-11-21 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 The new Wget flags empty Set-Cookie as a syntax error (but only
 displays it in -d mode; possibly a bug).

 I'm not clear on exactly what's possibly a bug: do you mean the fact
 that Wget only calls attention to it in -d mode?

That's what I meant.

 I probably agree with that behavior... most people probably aren't
 interested in being informed that a server breaks RFC 2616 mildly;

Generally, if Wget considers a header to be in error (and hence
ignores it), the user probably needs to know about that.  After all,
it could be the symptom of a Wget bug, or of an unimplemented
extension the server generates.  In both cases I as a user would want
to know.  Of course, Wget should continue to be lenient towards syntax
violations widely recognized by popular browsers.

Note that I'm not arguing that Wget should warn in this particular
case.  It is perfectly fine to not consider an empty `Set-Cookie' to
be a syntax error and to simply ignore it (and maybe only print a
warning in debug mode).


Re: bug on wget

2007-11-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
 Generally, if Wget considers a header to be in error (and hence
 ignores it), the user probably needs to know about that.  After all,
 it could be the symptom of a Wget bug, or of an unimplemented
 extension the server generates.  In both cases I as a user would want
 to know.  Of course, Wget should continue to be lenient towards syntax
 violations widely recognized by popular browsers.
 
 Note that I'm not arguing that Wget should warn in this particular
 case.  It is perfectly fine to not consider an empty `Set-Cookie' to
 be a syntax error and to simply ignore it (and maybe only print a
 warning in debug mode).

That was my thought. I agree with both of your points above: if Wget's
not handling something properly, I want to know about it; but at the
same time, silently ignoring (erroneous) empty headers doesn't seem like
a problem.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHRGqx7M8hyUobTrERCPwQAJ4wGFwPBqyoVDXjrOifNB/fVF1vtACbBnDU
fnSx/Vj+S+DVnfRUbIz5HKU=
=n4yr
-END PGP SIGNATURE-


Re: bug on wget

2007-11-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Diego Campo wrote:
 Hi,
 I got a bug on wget when executing:
 
 wget -a log -x -O search/search-1.html --verbose --wait 3
 --limit-rate=20K --tries=3
 http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1
 
 Segmentation fault (core dumped)

Hi Diego,

I was able to reproduce the problem above in the release version of
Wget; however, it appears to be working fine in the current development
version of Wget, which is expected to release soon as version 1.11.*

* Unfortunately, it has been expected to release soon for a few months
now; we got hung up with some legal/licensing issues that are yet to be
resolved. It will almost certainly be released in the next few weeks,
though.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQypR7M8hyUobTrERCF99AJ4w790h4juXzPwO+csBbSY3KcLOXACdGYgO
Kf4Oawgfjx6WOEzYwkQ47mw=
=8gL2
-END PGP SIGNATURE-


Re: bug on wget

2007-11-20 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 I was able to reproduce the problem above in the release version of
 Wget; however, it appears to be working fine in the current
 development version of Wget, which is expected to release soon as
 version 1.11.*

I think the old Wget crashed on empty Set-Cookie headers.  That got
fixed when I converted the Set-Cookie parser to use extract_param.
The new Wget flags empty Set-Cookie as a syntax error (but only
displays it in -d mode; possibly a bug).


Re: bug on wget

2007-11-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
 Micah Cowan [EMAIL PROTECTED] writes:
 
 I was able to reproduce the problem above in the release version of
 Wget; however, it appears to be working fine in the current
 development version of Wget, which is expected to release soon as
 version 1.11.*
 
 I think the old Wget crashed on empty Set-Cookie headers.  That got
 fixed when I converted the Set-Cookie parser to use extract_param.
 The new Wget flags empty Set-Cookie as a syntax error (but only
 displays it in -d mode; possibly a bug).

I'm not clear on exactly what's possibly a bug: do you mean the fact
that Wget only calls attention to it in -d mode?

I probably agree with that behavior... most people probably aren't
interested in being informed that a server breaks RFC 2616 mildly;
especially if it's not apt to affect the results. Unless of course the
user was expecting that the user send a real cookie, but I'm guessing
that this only happens when the server doesn't have one to send (or
something). But a user in that situation should be using -d (or at least
- -S) to find out what the server is sending.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQ3N97M8hyUobTrERCCpFAJ9RHcdJ8X4UWpEQIhz+khDWc8MOJwCfZANU
vr2lCTLP04R/PP/cBf7sIpE=
=6csr
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Josh Williams
On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote:
 I would have sent a fix too, but after finding my way through http.c 
 retr.c I got lost in url.c.

You and me both. A lot of the code needs re-written.. there's a lot of
spaghetti code in there. I hope Micah chooses to do a complete
re-write for version 2 so I can get my hands dirty and understand the
code better.


Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
 On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote:
 I would have sent a fix too, but after finding my way through http.c 
 retr.c I got lost in url.c.
 
 You and me both. A lot of the code needs re-written.. there's a lot of
 spaghetti code in there. I hope Micah chooses to do a complete
 re-write for version 2 so I can get my hands dirty and understand the
 code better.

Currently, I'm planning on refactoring what exists, as needed, rather
than going for a complete rewrite. This will be driven by unit-tests, to
try to ensure that we do not lose functionality along the way. This
involves more work overall, but IMO has these key advantages:

 * as mentioned, it's easier to prevent functionality loss,
 * we will be able to use the work as its written, instead of waiting
many months for everything to be finished (especially with the current
number of developers), and
 * AIUI, the wording of employer copyright assignment releases may not
apply to new works that are not _preexisting_ as GPL works. This means
that, if a rewrite ended up using no code whatsoever from the original
work (not likely, but...), there could be legal issues.

After 1.11 is released (or possibly before), one of my top priorities is
to clean up the gethttp and http_loop functions to a degree where they
can be much more readily read and understood (and modified!). This is
important to me because so far (in my
probably-not-statistically-significant 3 months as maintainer) a
majority of the trickier fixes have been in those two functions. Some of
these fixes seem to frequently introduce bugs of their own, and I spend
more time than seems right in trying to understand the code there, which
is why these particular functions are prime targets for refactoring. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBR7E7M8hyUobTrERCCrbAJ9Jw7LB/YW4myDOyPiHvXLZ13rkNQCeOVbf
5INV0ApmUTuzxp8zO5haVCA=
=EeEd
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Brian Keck wrote:
 Hello,
 
 I'm wondering if I've found a bug in the excellent wget.
 I'm not asking for help, because it turned out not to be the reason
 one of my scripts was failing.
 
 The possible bug is in the derivation of the filename from a URL which
 contains UTF-8.
 
 The case is:
 
   wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk
 
 Of course these are all ascii characters, but underlying it are
 3 nonascii characters, whose UTF-8 encoding is:
 
   hexoctal name
     ---  -
   C387  303 274  C-cedilla
   C3B6  303 266  o-umlaut
   C3BC  303 274  u-umlaut
 
 The file created has a name that's almost, but not quite, a valid UTF-8
 bytestring ... 
 
   ls *y*k | od -tc
   000 303   %   8   7   a   t   a   l   h 303 266   y 303 274   k  \n
 
 Ie the o-umlaut  u-umlaut UTF-8 encodings occur in the bytestring,
 but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the
 3-byte string %87.

Using --restrict=nocontrol will do what you want it to, in this instance.

 I'm guessing this is not intended.  

Actually, it is (more-or-less).

Realize that Wget really has no idea how to tell whether you're trying
to give it UTF-8, or one of the ISO latin charsets. It tends to assume
the latter. It also, by default, will not create filenames with control
characters in them. In ISO latin, characters in the range 0x80-0x9f are
control characters, which is why Wget left %87 escaped, which falls into
that range, but not the others, which don't.

It is actually illegal to specify byte values outside the range of ASCII
characters in a URL, but it has long been historical practice to do so
anyway. In most cases, the intended meaning was one of the latin
character sets (usually latin1), so Wget was right to do as it does, at
that time.

There is now a standard for representing Unicode values in URLs, whose
result is then called IRLs (Internationalized Resource Locators).
Conforming correctly to this standard would require that Wget be
sensitive to the context and encoding of documents in which it finds
URLs; in the case of filenames and command arguments, it would probably
also require sensitivity to the current locale as determined by
environment variables. Wget is simply not equipped to handle IRLs or
encoding issues at the moment, so until it is, a proper fix will not be
in place. Addressing these are considered a Wget 2.0 (next-generation
Wget functionality) priority, and probably won't be done for a year or
two, given that the number of developers involved with Wget, if you add
up all the part-time helpers (including me), is probably still less than
one full-time dev. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBSHX7M8hyUobTrERCKRLAJwKiDOo0uO7x/k/iAEB/W0pPQmUJQCfUHaP
c6k2490strgy1Efy1DmiOhA=
=7lvZ
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 It is actually illegal to specify byte values outside the range of
 ASCII characters in a URL, but it has long been historical practice
 to do so anyway. In most cases, the intended meaning was one of the
 latin character sets (usually latin1), so Wget was right to do as it
 does, at that time.

Your explanation is spot-on.  I would only add that Wget's
interpretation of what is a control character is not so much geared
toward Latin 1 as it is geared toward maximum safety.  Originally I
planned to simply encode *all* file name characters outside the 32-127
range, but in practice it was very annoying (not to mention
US-centric) to encode perfectly valid Latin 1/2/3/... as %xx.  Since
the codes 128-159 *are* control characters (in those charsets) that
can mess up your screen and that you wouldn't want seen by default, I
decided to encode them by default, but allow for a way to turn it off,
in case someone used a different charset.

In the long run, supporting something like IRL is surely the right
thing to go for, but I have a feeling that we'll be stuck with the
current messy URLs for quite some time to come.  So Wget simply needs
to adapt to the current circumstances.  If the locale includes UTF-8
in any shape or form, it is perfectly safe to assume that it's valid
to create UTF-8 file names.  Of course, we don't know if a particular
URL path sequence is really meant to be UTF-8, but there should be no
harm in allowing valid UTF-8 sequences to pass through.  In other
words, the default quote control policy could simply be smarter
about what control means.

One consequence would be that Wget creates differently-named files in
different locales, but it's probably a reasonable price to pay for not
breaking an important expectation.  Another consequence would be
making users open to IDN homograph attacks, but I don't know if that's
a problem in the context of creating file names (IDN is normally
defined as a misrepresentation of who you communicate with).

For those who want to hack on this, the place to look at is
url.c:append_uri_pathel; that strangely-named function takes a path
element (a directory name or file name component of the URL) and
appends it to the file name.  It takes care not to ever use .. as a
path component and to respect the --restrict-file-names setting as
specified by the user.  It could be made to recognize UTF-8 character
sequences in UTF-8 locales and exempt valid UTF-8 chars from being
treated as control characters.  Invalid UTF-8 chars would still pass
all the checks, and non-canonical UTF-8 sequences would be rejected
(by condemning their byte values to being escaped as %..).  This is
not much work for someone who understands the basics of UTF-8.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-15 Thread Rich Cook


On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:




sprintf(filecopy, \%.2047s\, file);


This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string RETR ; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the  
sequence CR

LF may not occcur in the filename). Therefore, if you ask for a file
file.txt, a conforming server will attempt to find and deliver a  
file

whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.


I think you may well be correct.  I am now unable to reproduce the  
problem where the server does not recognize a filename unless I give  
it quotes.  In fact, as you say, the server ONLY recognizes filenames  
WITHOUT quotes and quoting breaks it.  I had to revert to the non- 
quoted code to get proper behavior.  I am very confused now.  I  
apologize profusely for wasting your time.  How embarrassing!


I'll save this email, and if I see the behavior again, I will provide  
you with the details you requested below.




Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug



--
Rich wealthychef Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





Re: bug and patch: blank spaces in filenames causes looping

2007-07-15 Thread Josh Williams

On 7/15/07, Rich Cook [EMAIL PROTECTED] wrote:

I think you may well be correct.  I am now unable to reproduce the
problem where the server does not recognize a filename unless I give
it quotes.  In fact, as you say, the server ONLY recognizes filenames
WITHOUT quotes and quoting breaks it.  I had to revert to the non-
quoted code to get proper behavior.  I am very confused now.  I
apologize profusely for wasting your time.  How embarrassing!

I'll save this email, and if I see the behavior again, I will provide
you with the details you requested below.


I wouldn't say it was a waste of time. Actually, I think it's good for
us to know that this problem exists on some servers. We're considering
writing a patch to recognise servers that do not support spaces. If
the standard method fails, then it will retry as an escaped character.

Nothing has been written for this yet, but it has been discussed, and
may be implemented in the future.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-15 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
 
 On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:
 

 sprintf(filecopy, \%.2047s\, file);

 This fix breaks the FTP protocol, making wget instantly stop working
 with many conforming servers, but apparently start working with yours;
 the RFCs are very clear that the file name argument starts right after
 the string RETR ; the very next character is part of the file name,
 including if the next character is a space (or a quote). The file name
 is terminated by the CR LF sequence (which implies that the sequence CR
 LF may not occcur in the filename). Therefore, if you ask for a file
 file.txt, a conforming server will attempt to find and deliver a file
 whose name begins and ends with double-quotes.

 Therefore, this seems like a server problem.
 
 I think you may well be correct.  I am now unable to reproduce the
 problem where the server does not recognize a filename unless I give it
 quotes.  In fact, as you say, the server ONLY recognizes filenames
 WITHOUT quotes and quoting breaks it.  I had to revert to the non-quoted
 code to get proper behavior.  I am very confused now.  I apologize
 profusely for wasting your time.  How embarrassing!

No worries, it happens! Sometimes the tests we run go other than we
think they did. :)
 
 I'll save this email, and if I see the behavior again, I will provide
 you with the details you requested below.

That would be terrific, thanks.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGmpOD7M8hyUobTrERCA7FAJ4oygvX7rpQy1k5FL7j3R12LUdWUACfVHrc
sk1tpS12pDYBvVbD4Nv7/I4=
=KCxk
-END PGP SIGNATURE-


Re: bug and patch: blank spaces in filenames causes looping

2007-07-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
 On OS X, if a filename on the FTP server contains spaces, and the remote
 copy of the file is newer than the local, then wget gets thrown into a
 loop of No such file or directory endlessly.   I have changed the
 following in ftp-simple.c, and this fixes the error.
 Sorry, I don't know how to use the proper patch formatting, but it
 should be clear.

I and another developer could not reproduce this problem, either in the
current trunk or in wget 1.10.2.

 sprintf(filecopy, \%.2047s\, file);

This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string RETR ; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the sequence CR
LF may not occcur in the filename). Therefore, if you ask for a file
file.txt, a conforming server will attempt to find and deliver a file
whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.

Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug

Thank you very much.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGl9KT7M8hyUobTrERCJfoAJ91z9c2GniuoaX0mj9oqzHrrpNCtQCePQnm
lvbVe0i5/jVy9V10uQpYgmk=
=iQq1
-END PGP SIGNATURE-


Re: [bug #20323] Wget issues HEAD before GET, even when the file doesn't exist locally.

2007-07-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Mauro Tortonesi wrote:
 Micah Cowan ha scritto:
 Update of bug #20323 (project wget):

   Status:  Ready For Test = In
 Progress   
 ___

 Follow-up Comment #3:

 Moving back to In Progress until some questions about the logic are
 answered:

 http://addictivecode.org/pipermail/wget-notify/2007-July/75.html
 http://addictivecode.org/pipermail/wget-notify/2007-July/77.html
 
 thanks micah.
 
 i have partly misunderstood the logic behind preliminary HEAD request.
 in my code, HEAD is skipped if -O or --no-content-disposition are given,
 but if -N is given HEAD is always sent. this is wrong, as HEAD should be
 skipped even if -N and --no-content-disposition are given (no need to
 care about the deprecated -N -O combination). can't think of any other
 case in which HEAD should be skipped, though.

Cc'ing wget ML, as it's probably important to open up discussion of the
current logic.

What about the case when nothing is given on the command line except
- --no-content-disposition? What do we need HEAD for then?

Also: I don't believe HEAD should be sent if no options are given on the
command line. What purpose would that serve? If it's to find a possible
Content-Disposition header, we can get that (and more reliably) at GET
time (though, I believe we may currently be requiring the file name
before we fetch, which if true, should definitely be changed but not for
1.11, in which case the HEAD will be allowed for the time being); and
since we're not matching against potential accept/reject lists, we don't
really need it.

I think it really makes much more sense to enumerate those few cases
where we need to issue a HEAD, rather than try to determine all the
cases where we don't: if I have to choose a side to err on, I'd rather
not send HEAD in a case or two where we needed it, rather than send it
in a few where we didn't, as any request-response cycle eats up time. I
also believe that the cases where we want a HEAD are/should be fewer
than the cases where we don't want them.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlol+7M8hyUobTrERCOT0AJwNt2dm/80zL7UYbadBaiaPrMvSUQCePKmS
WO77ltxl0vr0Pcgd8H1bIY8=
=zCTU
-END PGP SIGNATURE-


Re: Bug update notifications

2007-07-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Matthew Woehlke wrote:
 Micah Cowan wrote:
 The wget-notify mailing list
 (http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
 receiving notifications of bug updates from GNU Savannah, in addition to
  subversion commits.
 
 ...any reason to not CC bug updates here also/instead? That's how e.g.
 kwrite does thing (also several other lists AFAIK), and seems to make
 sense. This is 'bug-wget' after all :-).

It is; but it's also 'wget'. While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the main
discussion/support list from the bugs list)?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkrpK7M8hyUobTrERCIMaAKCDG8JN7DmUK7oIuE0fYmgYnZIrlgCghK7n
iV8rIDYe1+cxzrQATM43CEM=
=PKqt
-END PGP SIGNATURE-


Re: Bug update notifications

2007-07-09 Thread Matthew Woehlke

Micah Cowan wrote:

Matthew Woehlke wrote:

Micah Cowan wrote:
...any reason to not CC bug updates here also/instead? That's how e.g.
kwrite does thing (also several other lists AFAIK), and seems to make
sense. This is 'bug-wget' after all :-).


It is; but it's also 'wget'.


Hmm, so it is; my bad :-).


While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the main
discussion/support list from the bugs list)?


I guess a common pattern is:
foo-help
foo-devel
foo-commits

...but of course you're the maintainer, it's your call :-).
(The above aren't necessarily actual names of course, just the 
categories it seems like I'm most used to seeing. e.g. the GNU 
convention is of course bug-foo, not foo-devel.)


--
Matthew
This .sig is false




Re: bug and patch: blank spaces in filenames causes looping

2007-07-06 Thread Steven M. Schweda
From various:

 [...]
char filecopy[2048];
if (file[0] != '') {
  sprintf(filecopy, \%.2047s\, file);
} else {
  strncpy(filecopy, file, 2047);
}
 [...]
 It should be:
 
  sprintf(filecopy, \%.2045s\, file);
 [...]

   I'll admit to being old and grumpy, but am I the only one who
shudders when one small code segment contains 2048, 2047, and 2045
as separate, independent literal constants, instead of using a macro, or
sizeof, or something which would let the next fellow change one buffer
size in one place, instead of hunting all over the code looking for
every 20xx which might be related?

   Just a thought.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: bug and patch: blank spaces in filenames causes looping

2007-07-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Steven M. Schweda wrote:
From various:
 
 [...]
char filecopy[2048];
if (file[0] != '') {
  sprintf(filecopy, \%.2047s\, file);
} else {
  strncpy(filecopy, file, 2047);
}
 [...]
 It should be:

  sprintf(filecopy, \%.2045s\, file);
 [...]
 
I'll admit to being old and grumpy, but am I the only one who
 shudders when one small code segment contains 2048, 2047, and 2045
 as separate, independent literal constants, instead of using a macro, or
 sizeof, or something which would let the next fellow change one buffer
 size in one place, instead of hunting all over the code looking for
 every 20xx which might be related?

Well, as already mentioned, aprintf() would be much more appropriate, as
it elminates the need for constants like these.

And yes, magic numbers drive me crazy, too. Of course, when used with
printf's 's' specifier, it needs special handling (crafting a STR()
macro or somesuch).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjxcX7M8hyUobTrERCHSAAJ9VkQdfhK4/LwByseYH2ZYVzoPqPwCePU1k
2Llybpq/oceXWMyZpBO4bPY=
=Vj/R
-END PGP SIGNATURE-


RE: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Tony Lewis
There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, \%.2047s\, file);

It should be:

 sprintf(filecopy, \%.2045s\, file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and patch: blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the  
remote copy of the file is newer than the local, then wget gets  
thrown into a loop of No such file or directory endlessly.   I have  
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it  
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request (RETR, file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '') {
 sprintf(filecopy, \%.2047s\, file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request (RETR, filecopy);






--
Rich wealthychef Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets  
better all the time.



Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
Good point, although it's only a POTENTIAL buffer overflow, and it's  
limited to 2 bytes, so at least it's not exploitable.  :-)



On Jul 5, 2007, at 9:05 AM, Tony Lewis wrote:


There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, \%.2047s\, file);

It should be:

 sprintf(filecopy, \%.2045s\, file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and patch: blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the
remote copy of the file is newer than the local, then wget gets
thrown into a loop of No such file or directory endlessly.   I have
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request (RETR, file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '') {
 sprintf(filecopy, \%.2047s\, file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request (RETR, filecopy);






--
Rich wealthychef Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets
better all the time.


--
Rich wealthychef Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





RE: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Virden, Larry W.
 


-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] 

Tony Lewis [EMAIL PROTECTED] writes:

 Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and 
 arbitrary limits on file name length.

If it uses the heap, then doesn't that open a hole where a particularly
long file name would overflow the heap?

-- 
URL: http://wiki.tcl.tk/ 
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
URL: mailto:[EMAIL PROTECTED]  URL: http://www.purl.org/NET/lvirden/

 


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 There is a buffer overflow in the following line of the proposed code:

  sprintf(filecopy, \%.2047s\, file);

Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Rich Cook [EMAIL PROTECTED] writes:

 Trouble is, it's undocumented as to how to free the resulting
 string.  Do I call free on it?

Yes.  Freshly allocated with malloc in the function documentation
was supposed to indicate how to free the string.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Virden, Larry W. [EMAIL PROTECTED] writes:

 Tony Lewis [EMAIL PROTECTED] writes:

 Wget has an `aprintf' utility function that allocates the result on
 the heap.  Avoids both buffer overruns and 
 arbitrary limits on file name length.

 If it uses the heap, then doesn't that open a hole where a particularly
 long file name would overflow the heap?

No, aprintf tries to allocate as much memory as necessary.  If the
memory is unavailable, malloc returns NULL and Wget exits.


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
Trouble is, it's undocumented as to how to free the resulting  
string.  Do I call free on it?  I'd use asprintf, but I'm afraid to  
suggest that here as it may not be portable.


On Jul 5, 2007, at 10:45 AM, Hrvoje Niksic wrote:


Tony Lewis [EMAIL PROTECTED] writes:

There is a buffer overflow in the following line of the proposed  
code:


 sprintf(filecopy, \%.2047s\, file);


Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.


--
Rich wealthychef Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook


On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook [EMAIL PROTECTED] writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  Freshly allocated with malloc in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that  
didn't show up in my man pages, so I punted.  Sorry.


--
✐There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com
925-784-3077
--
✐



RE: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Bruso, John
Please remove me from this list. thanks,
 
John Bruso



From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Thu 7/5/2007 12:30 PM
To: Hrvoje Niksic
Cc: Tony Lewis; [EMAIL PROTECTED]
Subject: Re: bug and patch: blank spaces in filenames causes looping




On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:

 Rich Cook [EMAIL PROTECTED] writes:

 Trouble is, it's undocumented as to how to free the resulting
 string.  Do I call free on it?

 Yes.  Freshly allocated with malloc in the function documentation
 was supposed to indicate how to free the string.

Oh, I looked in the source and there was this xmalloc thing that 
didn't show up in my man pages, so I punted.  Sorry.

--
?There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com http://5pmharmony.com/ 
925-784-3077
--
?





Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Rich Cook [EMAIL PROTECTED] writes:

 On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:

 Rich Cook [EMAIL PROTECTED] writes:

 Trouble is, it's undocumented as to how to free the resulting
 string.  Do I call free on it?

 Yes.  Freshly allocated with malloc in the function documentation
 was supposed to indicate how to free the string.

 Oh, I looked in the source and there was this xmalloc thing that
 didn't show up in my man pages, so I punted.  Sorry.

No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
So forgive me for a newbie-never-even-lurked kind of question:  will  
this fix make it into wget for other users (and for me in the  
future)?  Or do I need to do more to make that happen, or...?  Thanks!


On Jul 5, 2007, at 12:52 PM, Hrvoje Niksic wrote:


Rich Cook [EMAIL PROTECTED] writes:


On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook [EMAIL PROTECTED] writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  Freshly allocated with malloc in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that
didn't show up in my man pages, so I punted.  Sorry.


No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).


--
✐There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com
925-784-3077
--
✐



Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
 So forgive me for a newbie-never-even-lurked kind of question:  will
 this fix make it into wget for other users (and for me in the future)? 
 Or do I need to do more to make that happen, or...?  Thanks!

Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some point,
but I wouldn't expect it to come out in the next release (which, itself,
will not be arriving for a couple months); it will probably go into wget
1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-


Re: bug and patch: blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook

Thanks for the follow up.  :-)

On Jul 5, 2007, at 3:52 PM, Micah Cowan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:

So forgive me for a newbie-never-even-lurked kind of question:  will
this fix make it into wget for other users (and for me in the  
future)?

Or do I need to do more to make that happen, or...?  Thanks!


Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some  
point,
but I wouldn't expect it to come out in the next release (which,  
itself,
will not be arriving for a couple months); it will probably go into  
wget

1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-


--
✐There's no time to stop for gas, we're already late-- Karin Donker
--
Rich wealthychef Cook
http://5pmharmony.com
925-784-3077
--
✐



Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill
Mario Ander schrieb:
 Hi everybody,
 
 I think there is a bug storing cookies with wget.
 
 See this command line:
 
 C:\Programme\wget\wget --user-agent=Opera/8.5 (X11;
 U; en) --no-check-certificate --keep-session-cookies
 --save-cookies=cookie.txt --output-document=-
 --debug --output-file=debug.txt
 --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0
 https://www.vodafone.de/proxy42/portal/login.po;
[..]
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
 path=/jsp 
 Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
 expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
 path=/proxy42
[..]
 ---response end---
 200 OK
 Attempt to fake the path: /jsp,
 /proxy42/portal/login.po

So the problem seems to be that wget rejects cookies for paths which
don't fit to the request url. Like the script you call is in
/proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
those cookies, but wich is not related to /jsp

So it seems to be wget sticking to the strict RFC and the script doing
wrong.
To get this working you would need to patch wget for not RFC-compliant
cookies maybe along with an --accept-malformed-cookies directiv.

Hope this helps you

Matthias


Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill
Matthias Vill schrieb:
 Mario Ander schrieb:
 Hi everybody,

 I think there is a bug storing cookies with wget.

 See this command line:

 C:\Programme\wget\wget --user-agent=Opera/8.5 (X11;
 U; en) --no-check-certificate --keep-session-cookies
 --save-cookies=cookie.txt --output-document=-
 --debug --output-file=debug.txt
 --post-data=name=xxxpassword=dummy=Internetkennwortlogin.x=0login.y=0
 https://www.vodafone.de/proxy42/portal/login.po;
 [..]
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
 path=/jsp 
 Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
 expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
 Set-Cookie:
 JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
 path=/proxy42
 [..]
 ---response end---
 200 OK
 Attempt to fake the path: /jsp,
 /proxy42/portal/login.po
 
 So the problem seems to be that wget rejects cookies for paths which
 don't fit to the request url. Like the script you call is in
 /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
 those cookies, but wich is not related to /jsp
 
 So it seems to be wget sticking to the strict RFC and the script doing
 wrong.
 To get this working you would need to patch wget for not RFC-compliant
 cookies maybe along with an --accept-malformed-cookies directiv.
 
 Hope this helps you
 
 Matthias
 

So I thought of a second solution: If you have cygwin (or at least
bash+grep) you can run this small script to dublicate and truncate the
cookie.
--- CUT here ---
#!/bin/bash
#Author: Matthias Vill; feel free to change and use

#get the line for proxy42-path in $temp
temp=$(grep proxy42 cookies.txt)

#remove everything after last !
temp=${temp%!*}

#replace proxy42 by jsp
temp=${temp/proxy42/jsp}

#append newline to file
#echo cookies.txt

#add new cookie to cookies.txt
echo $tempcookies.txt
--- CUT here ---
Maybe you need to remove the # in front of echo cookies.txt to
compensate a missing trailing newline; otherwise you may end up changing
the value of the previous cookie.

Maybe this helps even more

Matthias


Re: Bug using recursive get and stdout

2007-04-17 Thread Steven M. Schweda
   A quick search at http://www.mail-archive.com/wget@sunsite.dk/; for
-O found:

  http://www.mail-archive.com/wget@sunsite.dk/msg08746.html
  http://www.mail-archive.com/wget@sunsite.dk/msg08748.html

   The way -O is implemented, there are all kinds of things which are
incompatible with it, -r among them.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Bug in 1.10.2 vs 1.9.1

2007-01-03 Thread Mauro Tortonesi

Juhana Sadeharju wrote:

Hello. Wget 1.10.2 has the following bug compared to version 1.9.1.
First, the bin/wgetdir is defined as
  wget -p -E -k --proxy=off -e robots=off --passive-ftp
  -o zlogwget`date +%Y%m%d%H%M%S` -r -l 0 -np -U Mozilla --tries=50
  --waitretry=10 $@

The download command is
  wgetdir http://udn.epicgames.com

Version 1.9.1 result: download ok
Version 1.10.2 result: only udn.epicgames.com/Main/WebHome downloaded
and other converted urls are of the form
  http://udn.epicgames.com/../Two/WebHome


hi juhana,

could you please try the current version of wget from our subversion 
repository:


http://www.gnu.org/software/wget/wgetdev.html#development

?

this bug should be fixed in the new code.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: BUG - .listing has sprung into existence

2006-10-30 Thread Steven M. Schweda
From: Sebastian

   Doctor, it hurts when I do this.

   Don't do that.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Bug

2006-09-15 Thread Mauro Tortonesi

Reece ha scritto:

Found a bug (sort of).

When trying to get all the images in the directory below:
http://www.netstate.com/states/maps/images/

It gives 403 Forbidden errors for most of the images even after
setting the agent string to firefox's, and setting -e robots=off

After a packet capture, it appears that the site will give the
forbidden error if the Refferer is not exaclty correct.  However,
since wget actually uses the domain www.netstate.com:80 instead of
without the port, it screws it all up.  I've been unable to find any
way to tell wget not to insert the port in the requesting url and
referrer url.

Here is the full command I was using:

wget -r -l 1 -H -U Mozilla/4.0 (compatible; MSIE 5.01; Windows NT
5.0) -e robots=off -d -nh http://www.netstate.com/states/maps/images/


hi reece,

that's an interesting bug. i've just added it to my THINGS TO FIX list.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: bug/feature request

2006-07-27 Thread Marc Schoechlin
Hi !

Maybe you can add this patch to your mainline-tree:

http://www.mail-archive.com/wget%40sunsite.dk/msg09142.html

Best regards

Marc Schoechlin

On Wed, Jul 26, 2006 at 07:26:45AM +0200, Marc Schoechlin wrote:
 Date: Wed, 26 Jul 2006 07:26:45 +0200
 From: Marc Schoechlin [EMAIL PROTECTED]
 Subject: bug/feature request
 To: [EMAIL PROTECTED]
 
 Hi,
 
 i´m not sure if that is a feature request or a bug.
 Wget does not collect all page requisites of a given URL.
 Many sites are referencing components of these sites in cascading style 
 sheets,
 but wget does not collect these components as page requisites.
 
 A example:
 ---
 $ wget -q -p -k -nc -x --convert-links \
   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496901
 $ find . -name *.css
 ./aspn.activestate.com/ASPN/static/aspn.css
 $ grep  url( ./aspn.activestate.com/ASPN/static/aspn.css
 list-style-image: url(/ASPN/img/dot_A68C53_8x8_.gif);
background-image: url(/ASPN/img/ads/ASPN_banner_bg.gif);
background-image: url('/ASPN/img/ads/ASPN_komodo_head.gif');
 background-image: url('/ASPN/img/ads/ASPN_banner_bottom.gif');
 $ find . -name ASPN_banner_bg.gif || echo not found
 ---
 
 A solution for this problem would to parse all collected *.css files
 for lines which match for url(.*) and to collect these files.
 
 Best regards
 
 Marc Schoechlin
 -- 
 I prefer non-proprietary document-exchange.
 http://sector7g.wurzel6.de/pdfcreator/
 http://www.prooo-box.org/
 Contact me via jabber: [EMAIL PROTECTED]

-- 
I prefer non-proprietary document-exchange.
http://sector7g.wurzel6.de/pdfcreator/
http://www.prooo-box.org/
Contact me via jabber: [EMAIL PROTECTED]


Re: Bug in wget 1.10.2 makefile

2006-07-17 Thread Mauro Tortonesi

Daniel Richard G. ha scritto:

Hello,

The MAKEDEFS value in the top-level Makefile.in also needs to include 
DESTDIR='$(DESTDIR)'.


fixed, thanks.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: BUG

2006-07-10 Thread Mauro Tortonesi

Tony Lewis ha scritto:


Run the command with -d and post the output here.


in this case, -S can provide more useful information than -d. be careful to 
 obfuscate passwords, though!!!


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


RE: BUG

2006-07-03 Thread Tony Lewis
Title: RE: BUG






Run the command with -d and post the output here.


Tony

_ 

From:  Junior + Suporte [mailto:[EMAIL PROTECTED]] 

Sent: Monday, July 03, 2006 2:00 PM

To: [EMAIL PROTECTED]

Subject: BUG


Dear,


I using wget to send login request to a site, when wget is saving the cookies, the following error message appear:


Error in Set-Cookie, field `Path'Syntax error in Set-Cookie: tu=661541|802400391

@TERRA.COM.BR; Expires=Thu, 14-Oct-2055 20:52:46 GMT; Path= at position 78.

Location: http://www.tramauniversitario.com.br/servlet/login.jsp?username=802400

391%40terra.com.brpass=123qwerd=http%3A%2F%2Fwww.tramauniversitario.com.br%2Ft

uv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp [following]


I trying to access URL http://www.tramauniversitario.com.br/tuv2/participe/login.jsp?rd=http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp[EMAIL PROTECTED]pass=123qweSubmit.x=6Submit.y=1

In Internet Explorer, this URL work correctly and the cookie is saved in the local machine, but in WGET, this cookie return an error. 

Thanks,


Luiz Carlos Zancanella Junior





RE: Bug in GNU Wget 1.x (Win32)

2006-06-22 Thread Herold Heiko
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Þröstur
 Sent: Wednesday, June 21, 2006 4:35 PM

There have been some reports in the past but I don't think it has been acted
upon; one of the problems is that the list of names can be extended at will
(beside the standard comx, lptx, con, prn). Maybe it is possible to query
the os about the currently active device names and rename the output files
if neccessary ?

   I reproduced the bug with Win32 versions 1.5.dontremeber,
 1.10.1 and 1.10.2. I did also test version 1.6 on Linux but it
 was not affected.

That is since the problem is generated by the dos/windows filesystem drivers
(or whatever those should be called), basically com1* and so on are
equivalent of unix device drivers, with the unfortunate difference of acting
in every directory. 

 
 Example URLs that reproduce the bug :
 wget g/nul
 wget http://www.gnu.org/nul
 wget http://www.gnu.org/nul.html
 wget -o loop.end http://www.gnu.org/nul.html;
 
   I know that the bug is associated with words which are
 devices in the windows console, but i don't understand
 why, since I tried to set the output file to something else.

I think you meant to use -O, not -o.
Doesn't solve the real problem but at least a workaround.

Heiko 

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 / +39-041-5917073 ph
-- +39-041-5907472 / +39-041-5917472 fax


Re: BUG: wget with option -O creates empty files even if the remote file does not exist

2006-06-01 Thread Steven M. Schweda
From: Eduardo M KALINOWSKI

 wget http://www.somehost.com/nonexistant.html -O localfile.html
 
 then file localfile.html will always be created, and will have length
 of zero even if the remote file does not exist.

   Because with -O, Wget opens the output file before it does any
network activity, and after it's done, it closes the file and leaves it
there, regardless of its content (or lack of content).

   You could avoid -O, and rename the file after the Wget command. 
You could keep the -O, and check the status of the Wget command
(and/or check the output file size), and delete the file if it's no
good.  (And probably many other things, as well.)

   If you look through http://www.mail-archive.com/wget@sunsite.dk/;,
you can find many people who think that -O should do something else,
but (for now) it does what it does.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: bug?

2006-05-16 Thread Hrvoje Niksic
yy :) [EMAIL PROTECTED] writes:

 I ran wget -P /tmp/.test [1]http://192.168.1.10; in SUSE system (SLES 9)
 and found that it saved the file in /tmp/_test.
 This command works fine inRedHat, is it a bug?

I believe the bug is introduced by SuSE in an attempt to protect the
user.  Try reporting it to them.


Re: Bug in ETA code on x64

2006-04-03 Thread Thomas Braby


- Original Message -
From: Hrvoje Niksic [EMAIL PROTECTED]
Date: Tuesday, March 28, 2006 7:23 pm

  in progress.c line 880:
 
 eta_hrs = (int)(eta / 3600, eta %= 3600);
 eta_min = (int)(eta / 60, eta %= 60);
 eta_sec = (int)(eta);
 
 This is weird.  Did you compile the code yourself, or did you get it

Yes that is strange. I got the code from one of the GNU mirrors, but 
I'm afraid I can't remember which one.

 from a Windows download site?  I'm asking because the code in
 progress.c doesn't look like that; it in fact looks like this:
 
  eta_hrs = eta / 3600, eta %= 3600;
  eta_min = eta / 60,   eta %= 60;
  eta_sec = eta;
 
 The cast to int looks like someone was trying to remove a warning and
 botched operator precedence in the process.  If you must insert the
 cast, try:
 
 eta_hrs = (int) (eta / 3600), eta %= 3600;

Yes that also works. The cast is needed on Windows x64 because eta is 
a wgint (which is 64-bit) but a regular int is 32-bit so otherwise a 
warning is issued. Oh well. Perhaps it would be better changed to use 
a semicolon for clarity anyway?

cheers,


Re: Bug in ETA code on x64

2006-04-03 Thread Hrvoje Niksic
Thomas Braby [EMAIL PROTECTED] writes:

 eta_hrs = (int) (eta / 3600), eta %= 3600;

 Yes that also works. The cast is needed on Windows x64 because eta is 
 a wgint (which is 64-bit) but a regular int is 32-bit so otherwise a 
 warning is issued.

The same is the case on 32-bit Windows, and also on Linux.  I don't
see the value in that warning.  Maybe we can disable it with a
compiler flag?

 Oh well. Perhaps it would be better changed to use a semicolon for
 clarity anyway?

Note that, without the cast, both semicolon and comma work equally well.


Re: Bug report

2006-04-01 Thread Frank McCown

Gary Reysa wrote:

Hi,

I don't really know if this is a Wget bug, or some problem with my 
website, but, either way, maybe you can help.


I have a web site ( www.BuildItSolar.com ) with perhaps a few hundred 
pages (260MB of storage total).  Someone did a Wget on my site, and 
managed to log 111,000 hits and 58,000 page views (using more than a GB 
of bandwidth).


I am wondering how this can happen, since the number of page views is 
about 200 times the number of pages on my site??


Is there something I can do to prevent this?  Is there something about 
the organization of my website that is causing Wget to get stuck in a loop?


I've never used Wget, but I am guessing that this guy really did not 
want 50,000+ pages -- do you provide some way for the user to shut 
itself down when it reaches some reasonable limit?


My website is non-commercial, and provides a lot of information that 
people find useful in building renewable energy projects.  It generates 
zero income, and I can't really afford to have a lot of people come in 
and burn up GBs of bandwidth to no useful end.  Help!


Gary Reysa


Bozeman, MT
[EMAIL PROTECTED]



Hello Gary,

From a quick look at your site, it appears to be mainly static html 
that would not generate a lot of extra crawls.  If you have some dynamic 
portion of your site, like a calendar, that could make wget go into an 
infinite loop.  It would be much easier to tell if you could look at the 
server logs that show what pages were requested.  They would easily tell 
you want wget was getting hung on.


One problem I did notice is that your site is generating soft 404s. 
In other words, it is sending back a http 200 response when it should be 
sending back a 404 response.  So if wget tries to access


http://www.builditsolar.com/blah

your web server is telling wget that the page actually exists.  This 
*could* cause more crawls than necessary, but not likely.  This problem 
should be fixed though.


It's possible the wget user did not know what they were doing and ran 
the crawler several times.  You could try to block traffic from that 
particular IP address or create a robots.txt file that tells crawlers to 
stay away from your site or just certain pages.  Wget respects 
robots.txt.  For more info:


http://www.robotstxt.org/wc/robots.html

Regards,
Frank



Re: Bug in ETA code on x64

2006-03-29 Thread Greg Hurrell

El 28/03/2006, a las 20:43, Tony Lewis escribió:


Hrvoje Niksic wrote:


The cast to int looks like someone was trying to remove a warning and
botched operator precedence in the process.


I can't see any good reason to use , here. Why not write the line  
as:

  eta_hrs = eta / 3600; eta %= 3600;


Because that's not equivalent. The sequence or comma operator , has  
two operands: first the left operand is evaluated, then the right.  
The result has the type and value of the right operand. Note that a  
command in a list of initializations or arguments is not an operator,  
but simply a punctuation mark!.


Cheers,
Greg




smime.p7s
Description: S/MIME cryptographic signature


Re: Bug in ETA code on x64

2006-03-29 Thread Hrvoje Niksic
Greg Hurrell [EMAIL PROTECTED] writes:

 El 28/03/2006, a las 20:43, Tony Lewis escribió:

 Hrvoje Niksic wrote:

 The cast to int looks like someone was trying to remove a warning and
 botched operator precedence in the process.

 I can't see any good reason to use , here. Why not write the line
 as:
   eta_hrs = eta / 3600; eta %= 3600;

 Because that's not equivalent.

Well, it should be, because the comma operator has lower precedence
than the assignment operator (see http://tinyurl.com/evo5a,
http://tinyurl.com/ff4pp and numerous other locations).

I'd still like to know where Thomas got his version of progress.c
because it seems that the change has introduced the bug.


Re: Bug in ETA code on x64

2006-03-28 Thread Hrvoje Niksic
Thomas Braby [EMAIL PROTECTED] writes:

 With wget 1.10.2 compiled using Visual Studio 2005 for Windows XP x64 
 I was getting no ETA until late in the transfer, when I'd get things 
 like:

 49:49:49 then 48:48:48 then 47:47:47 etc.

 So I checked the eta value in seconds and it was correct, so the code 
 in progress.c line 880:

eta_hrs = (int)(eta / 3600, eta %= 3600);
eta_min = (int)(eta / 60, eta %= 60);
eta_sec = (int)(eta);

This is weird.  Did you compile the code yourself, or did you get it
from a Windows download site?  I'm asking because the code in
progress.c doesn't look like that; it in fact looks like this:

  eta_hrs = eta / 3600, eta %= 3600;
  eta_min = eta / 60,   eta %= 60;
  eta_sec = eta;

The cast to int looks like someone was trying to remove a warning and
botched operator precedence in the process.  If you must insert the
cast, try:

eta_hrs = (int) (eta / 3600), eta %= 3600;
...


RE: Bug in ETA code on x64

2006-03-28 Thread Tony Lewis
Hrvoje Niksic wrote:

 The cast to int looks like someone was trying to remove a warning and
 botched operator precedence in the process.

I can't see any good reason to use , here. Why not write the line as:
  eta_hrs = eta / 3600; eta %= 3600;

This makes it much less likely that someone will make a coding error while
editing that section of code.

Tony



Re: Bug in TOLOWER macro when STANDALONE (?)

2006-03-06 Thread Hrvoje Niksic
Beni Serfaty [EMAIL PROTECTED] writes:

 I Think I found a bug when STANDALONE is defined on hash.c
 I hope I'm not missing something here...

Good catch, thanks.  I've applied a slightly different fix, appended
below.

By the way, are you using hash.c in a project?  I'd like to hear if
you're satisfied with it and would be very interested in any
suggestions and, of course, bugs.  hash.c was written to be
reuse-friendly.

Also note that you can get the latest version of the file (this fix
included) from http://svn.dotsrc.org/repo/wget/trunk/src/hash.c .


2006-03-06  Hrvoje Niksic  [EMAIL PROTECTED]

* hash.c (TOLOWER): Fix definition when STANDALONE.
Reported by Beni Serfaty.

Index: src/hash.c
===
--- src/hash.c  (revision 2119)
+++ src/hash.c  (working copy)
@@ -53,7 +53,8 @@
 # ifndef countof
 #  define countof(x) (sizeof (x) / sizeof ((x)[0]))
 # endif
-# define TOLOWER(x) ('A' = (x)  (x) = 'Z' ? (x) - 32 : (x))
+# include ctype.h
+# define TOLOWER(x) tolower ((unsigned char) x)
 # if __STDC_VERSION__ = 199901L
 #  include stdint.h  /* for uintptr_t */
 # else


Re: bug retrieving embedded images with --page-requisites

2005-11-09 Thread Jean-Marc MOLINA
Tony Lewis wrote:
 The --convert-links option changes the website path to a local file
 system path. That is, it changes the directory, not the file name.

Thanks I didn't understand it that way.

 IMO, your suggestion has merit, but it would require wget to maintain
 a list of MIME types and corresponding renaming rules.

Well it seems implementing the Content-Type header is planned since a long
time and there are two items about it in the TODO document of the wget
distrib.

Maintaining a list of MIME types is not an issue as there are already lists
around :
* File suffixes and MIME types at Duke University :
http://www.duke.edu/websrv/file-extensions.html
* MIME Types category at Google :
http://www.google.com/Top/Computers/Data_Formats/MIME_Types
* ...

Just a word about how HTTrack handles MIME types and extensions. It has a
powerful --assume option that allows users to assign a MIME type to
extensions. For example : All .php files are PNG images. Everything is
explained on the Option panel : MIME Types page at
http://www.httrack.com/html/step9_opt11.html. I think wget could use such an
option.

JM.





Re: bug in wget windows

2005-10-14 Thread Mauro Tortonesi

Tobias Koeck wrote:

done.
== PORT ... done.== RETR SUSE-10.0-EvalDVD-i386-GM.iso ... done.

[   =  ] -673,009,664  113,23K/s

Assertion failed: bytes = 0, file retr.c, line 292

This application has requested the Runtime to terminate it in an unusual 
way.

Please contact the application's support team for more information.


you are probably using an older version of wget, without long file 
support. please upgrade to wget 1.10.2.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: Bug rpt

2005-09-20 Thread Hrvoje Niksic
HonzaCh [EMAIL PROTECTED] writes:

 My localeconv()-thousands_sep (as well as many other struct
 members) reveals to empty string () (MSVC6.0).

 How do you know?  I mean, what program did you use to check this?

 My quick'n'dirty one. See the source below.

Your source neglects to setlocale(LC_ALL, ), which you must do
before locale goes into effect.  Otherwise you're getting values from
the C locale, which doesn't define thousand separators.


Re: Bug rpt

2005-09-19 Thread Hrvoje Niksic
HonzaCh [EMAIL PROTECTED] writes:

 Latest version (1.10.1) turns out an UI bug: the thousand separator
 (space according to my local settings) displays as á (character
 code 0xA0, see attch.)

 Although it does not affect the primary function of WGET, it looks
 quite ugly.

 Env.: Win2k Pro/Czech (CP852 for console apps, CP1250 for windowed
 ones).

Thanks for the report.  Is this a natively compiled Wget or one
compiled on Cygwin?

Wget obtains the thousand separator from the operating system using
the `localeconv' function.  According to MSDN
(http://tinyurl.com/cumk2 and http://tinyurl.com/chubg), Wget's usage
appears to be correct.  I'd be surprised if that function didn't
function properly on Windows.

Can other Windows testers repeat this problem?


Re: Bug handling session cookies

2005-06-24 Thread Hrvoje Niksic
Mark Street [EMAIL PROTECTED] writes:

 I'm not sure why this [catering for paths without a leading /] is
 done in the code.

rfc1808 declared that the leading / is not really part of path, but
merely a separator, presumably to be consistent with its treatment
of ;params, ?queries, and #fragments.  The author of the code found it
appealing to disregard common sense and implement rfc1808 semantics.

In most cases the user shouldn't notice the difference, but it has
lead to all kinds of implementation problems with code that assumes
that URL paths naturally begin with /.  Because of that it will be
changed later.

 Note that the forward slash is stripped from prefix, hence never
 matches full_path.  I'm not sure why this is done in the code.

Because PREFIX is the path declared by the cookie, which always begins
with /, and FULL_PATH is the URL path coming from the URL parsing
code, which doesn't begin with a /.  To match them, one must indeed
strip the leading / off PREFIX.

But paths without a slash still caused subtle problems.  For example,
cookies without a path attribute still had to be stored with the
correct cookie-path (with a leading slash).  To account for this, the
invocation of cookie_handle_set_cookie was modified to prepend the /
before the path.  This lead to path_match unexpectedly receiving two
/-prefixed paths and being unable to match them.

The attached patch fixes the problem by:

* Making sure that path consistently gets prepended in all entry
  points to cookie code;

* Removing the special logic from path_match.

With that change your test case seems to work, and so do all the other
tests I could think of.

Please let me know if it works for you, and thanks for the detailed
bug report.


2005-06-24  Hrvoje Niksic  [EMAIL PROTECTED]

* http.c (gethttp): Don't prepend / here.

* cookies.c (cookie_handle_set_cookie): Prepend / to PATH.
(cookie_header): Ditto.

Index: src/http.c
===
--- src/http.c  (revision 1794)
+++ src/http.c  (working copy)
@@ -1706,7 +1706,6 @@
   /* Handle (possibly multiple instances of) the Set-Cookie header. */
   if (opt.cookies)
 {
-  char *pth = NULL;
   int scpos;
   const char *scbeg, *scend;
   /* The jar should have been created by now. */
@@ -1717,15 +1716,8 @@
   ++scpos)
{
  char *set_cookie; BOUNDED_TO_ALLOCA (scbeg, scend, set_cookie);
- if (pth == NULL)
-   {
- /* u-path doesn't begin with /, which cookies.c expects. */
- pth = (char *) alloca (1 + strlen (u-path) + 1);
- pth[0] = '/';
- strcpy (pth + 1, u-path);
-   }
- cookie_handle_set_cookie (wget_cookie_jar, u-host, u-port, pth,
-   set_cookie);
+ cookie_handle_set_cookie (wget_cookie_jar, u-host, u-port,
+   u-path, set_cookie);
}
 }
 
Index: src/cookies.c
===
--- src/cookies.c   (revision 1794)
+++ src/cookies.c   (working copy)
@@ -822,6 +822,17 @@
 {
   return path_matches (path, cookie_path) != 0;
 }
+
+/* Prepend '/' to string S.  S is copied to fresh stack-allocated
+   space and its value is modified to point to the new location.  */
+
+#define PREPEND_SLASH(s) do {  \
+  char *PS_newstr = (char *) alloca (1 + strlen (s) + 1);  \
+  *PS_newstr = '/';\
+  strcpy (PS_newstr + 1, s);   \
+  s = PS_newstr;   \
+} while (0)
+
 
 /* Process the HTTP `Set-Cookie' header.  This results in storing the
cookie or discarding a matching one, or ignoring it completely, all
@@ -835,6 +846,11 @@
   struct cookie *cookie;
   cookies_now = time (NULL);
 
+  /* Wget's paths don't begin with '/' (blame rfc1808), but cookie
+ usage assumes /-prefixed paths.  Until the rest of Wget is fixed,
+ simply prepend slash to PATH.  */
+  PREPEND_SLASH (path);
+
   cookie = parse_set_cookies (set_cookie, update_cookie_field, false);
   if (!cookie)
 goto out;
@@ -977,17 +993,8 @@
 static int
 path_matches (const char *full_path, const char *prefix)
 {
-  int len;
+  int len = strlen (prefix);
 
-  if (*prefix != '/')
-/* Wget's HTTP paths do not begin with '/' (the URL code treats it
-   as a mere separator, inspired by rfc1808), but the '/' is
-   assumed when matching against the cookie stuff.  */
-return 0;
-
-  ++prefix;
-  len = strlen (prefix);
-
   if (0 != strncmp (full_path, prefix, len))
 /* FULL_PATH doesn't begin with PREFIX. */
 return 0;
@@ -1149,6 +1156,7 @@
   int count, i, ocnt;
   char *result;
   int result_size, pos;
+  PREPEND_SLASH (path);/* see cookie_handle_set_cookie */
 
   /* First, find the cookie chains whose domains 

Re: Bug handling session cookies

2005-06-24 Thread Mark Street

Hrvoje,

Many thanks for the explanation and the patch.
Yes, this patch successfully resolves the problem for my particular test
case.

Best regards,

Mark Street.




Re: Bug handling session cookies

2005-06-24 Thread Hrvoje Niksic
Mark Street [EMAIL PROTECTED] writes:

 Many thanks for the explanation and the patch.  Yes, this patch
 successfully resolves the problem for my particular test case.

Thanks for testing it.  It has been applied to the code and will be in
Wget 1.10.1 and later.


Re: Bug: wget cannot handle quote

2005-06-21 Thread Hrvoje Niksic
Will Kuhn [EMAIL PROTECTED] writes:

 Apparentl wget does not handle single quote or double quote very well.
 wget with the following arguments give error.

  wget
  --user-agent='Mozilla/5.0' --cookies=off --header
  'Cookie: testbounce=testing;
  ih=b'!!!0T#8G(5A!!#c`#8HWsH!!#wt#8I0HY!!#yf#8I0G3;
  cf=b$y~!!!D)#; hi=b#!!!D)8I=C]'
  'ad.yieldmanager.com/imp?z=12n=2E=01-329I=508S=508-1'
  -O /home/admin/http/wwwscanfile.YYO3Cy

You haven't stated which error you get, but on my system the error
comes from the shell and not from Wget.  The problem is that you used
single quotes to quote a string that contains, among other things,
single quotes.  This effectively turned off the quoting for some
portions of the text, causing the shell to interpret the bangs (!) 
as (invalid) history events.

To correct the problem, replace ' within single quotes with something
like '\'':

wget --user-agent='Mozilla/5.0' --cookies=off --header 'Cookie: 
testbounce=testing; 
ih=b'\''!!!0T#8G(5A!!#c`#8HWsH!!#wt#8I0HY!!#yf#8I0G3; 
cf=b$y~!!!D)#; hi=b#!!!D)8I=C]' 
'ad.yieldmanager.com/imp?z=12n=2E=01-329I=508S=508-1' -O 
/home/admin/http/wwwscanfile.YYO3Cy


RE: bug with password containing @

2005-05-26 Thread Andrew Gargan




Hi 

wget ftp://someuser:[EMAIL PROTECTED]@www.somedomain.com/some_file.tgz

is splitting using on the first @ not the second.

Is this a problem with the URL standard or a wget issue?

Regards

Andrew Gargan




Re: bug with password containing @

2005-05-26 Thread Hrvoje Niksic
Andrew Gargan [EMAIL PROTECTED] writes:

 wget ftp://someuser:[EMAIL PROTECTED]@www.somedomain.com/some_file.tgz

 is splitting using on the first @ not the second.

Encode the '@' as %40 and this will work.  For example:

wget ftp://someuser:[EMAIL PROTECTED]/some_file.tgz

 Is this a problem with the URL standard or a wget issue?

Neither, but maybe URL could be smarter about handling the above case.


Re: bug in static build of wget with socks

2005-05-16 Thread Hrvoje Niksic
Seemant Kulleen [EMAIL PROTECTED] writes:

 I wanted to alert you all to a bug in wget, reported by one of our
 (gentoo) users at:

 https://bugs.gentoo.org/show_bug.cgi?id=69827

 I am the maintainer for the Gentoo ebuild for wget.

 If someone would be willing to look at and help us with that bug,
 it'd be much appreciated.

Since I don't use Gentoo, I'll need more details to fix this.

For one, I haven't tried Wget with socks for a while now.  Older
versions of Wget supported of --with-socks option, but the procedure
for linking a program with socks changed since then, and the option
was removed due to bitrot.  I don't know how the *dynamic* linking
against socks works in Gentoo, either.

Secondly, I have very little experience with creating static binaries,
since I personally don't need them.  I don't even know what flags
USE=static causes to be passed to the compiler and the linker.
Likewise, I don't have a clue why there is a difference between Wget
1.8 and Wget 1.9 in this, nor why the presence of socks makes the
slightest difference.

I don't even know if this is a bug in Wget or in the way that the
build is attempted by the Gentoo package mechanism.  Providing the
actual build output might shed some light on this.


Re: bug in static build of wget with socks

2005-05-16 Thread Hrvoje Niksic
Seemant Kulleen [EMAIL PROTECTED] writes:

 Since I don't use Gentoo, I'll need more details to fix this.
 
 For one, I haven't tried Wget with socks for a while now.  Older
 versions of Wget supported of --with-socks option, but the procedure
 for linking a program with socks changed since then, and the option
 was removed due to bitrot.  I don't know how the *dynamic* linking
 against socks works in Gentoo, either.

 Ah ok, ./configure --help still shows the option, so this is fairly
 undocumented then.

I spoke too soon: it turns out that --with-socks is only removed in
Wget 1.10 (now in beta).

But --with-socks in 1.9.1 doesn't really force linking with the socks
library, it merely checks for a Rconnect function in -lsocks.  If
that is not found, the build is continued as usual.  You should check
the configure output (along with `ldd' on the resulting executable) to
see if that really worked.

 I don't even know if this is a bug in Wget or in the way that the
 build is attempted by the Gentoo package mechanism.  Providing the
 actual build output might shed some light on this.

 if use static; then
 emake LDFLAGS=--static || die

I now tried `LDFLAGS=--static ./configure', and it seems to work in
1.10.  Linking does produce two warnings, but the resulting executable
is static.


Re: Bug when downloading large files (over 2 gigs) from proftpd server.

2005-04-27 Thread Hrvoje Niksic
This problem has been fixed for the upcoming 1.10 release.  If you
want to try it, it's available at
ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha2.tar.bz2


Re: Bug

2005-03-20 Thread Jens Rösner
Hi Jorge!

Current wget versions do not support large files 2GB. 
However, the CVS version does and the fix will be introduced 
to the normal wget source. 

Jens
(just another user)

 When downloading a file of 2GB and more, the counter get crazy, probably
 it should have a long instead if a int number.

-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


RE: bug-wget still useful

2005-03-15 Thread Post, Mark K
I don't know why you say that.  I see bug reports and discussion of fixes
flowing through here on a fairly regular basis.


Mark Post


-Original Message-
From: Dan Jacobson [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 15, 2005 3:04 PM
To: [EMAIL PROTECTED]
Subject: bug-wget still useful


Is it still useful to mail to [EMAIL PROTECTED] I don't think anybody's
home.  Shall the address be closed?


Re: bug-wget still useful

2005-03-15 Thread Hrvoje Niksic
Dan Jacobson [EMAIL PROTECTED] writes:

 Is it still useful to mail to [EMAIL PROTECTED] I don't think
 anybody's home.  Shall the address be closed?

If you're referring to Mauro being busy, I don't see it as a reason to
close the bug reporting address.


Re: bug-wget still useful

2005-03-15 Thread Dan Jacobson
P I don't know why you say that.  I see bug reports and discussion of fixes
P flowing through here on a fairly regular basis.

All I know is my reports for the last few months didn't get the usual (any!)
cheery replies. However, I saw them on Gmane, yes.


Re: Bug: really large files cause problems with status text

2005-02-02 Thread Ulf Härnhammar
Quoting Alan Robinson [EMAIL PROTECTED]:

 When downloading a 4.2 gig file (such as from
 ftp://movies06.archive.org/2/movies/abe_lincoln_of_the_4th_ave/abe_lincoln_o
 f_the_4th_ave.mpeg ) cause the status text (i.e.
 100%[+===] 38,641,328   213.92K/sETA
 00:00) to print invalid things (in this case, that 100% of the file has been
 downloaded, even though only 40MB really has.

It is a Frequently Asked Question, with the answer that people are working on
it.

// Ulf



Re: Bug (wget 1.8.2): Wget downloads files rejected with -R.

2005-01-22 Thread jens . roesner
Hi Jason!

If I understood you correctly, this quote from the manual should help you:
***
Note that these two options [accept and reject based on filenames] do not
affect the downloading of HTML files; Wget must load all the HTMLs to know
where to go at all--recursive retrieval would make no sense otherwise.
***

If you are seeing wget behaviour different from this, please a) update your
wget and b) provide more details where/how it happens.

CU  good luck!
Jens (just another user)



 When the -R option is specified to reject files by name in recursive mode,
 wget downloads them anyway then deletes them after downloading. This is a
 problem when you are trying to be picky about the files you are
downloading
 to save bandwidth. Since wget appears to know the name of the file it is
 downloading before it is downloaded (even if the specified URL is
redirected
 to a different filename), then it should not bother downloading the file
 at all if it is going to delete it immediately after downloading it.
 
 - Jason Cipriani
 

-- 
GMX im TV ... Die Gedanken sind frei ... Schon gesehen?
Jetzt Spot online ansehen: http://www.gmx.net/de/go/tv-spot


Re: Bug#261755: Control sequences injection patch

2004-08-23 Thread Jan Minar
On Sun, Aug 22, 2004 at 08:02:54PM +0200, Jan Minar wrote:
 +/* vasprintf() requires _GNU_SOURCE.  Which is OK with Debian. */
 +#ifndef _GNU_SOURCE
 +#define _GNU_SOURCE

This must be done before stdio.h is included.

 +#endif
 +#include ctype.h
 +
  #ifndef errno
  extern int errno;
  #endif
 @@ -345,7 +351,49 @@
int expected_size;
int allocated;
  };
 +
 +/* XXX Where does the declaration belong?? */
 +void escape_buffer (char **src);
  
 +/*
 + * escape_untrusted  -- escape using '\NNN'.  To be used wherever we want to
 + * print untrusted data.
 + *
 + * Syntax: escape_buffer (buf-to-escape);
 + */
 +void escape_buffer (char **src)
 +{
 + char *dest;
 + int i, j;
 +
 + /* We encode each byte using at most 4 bytes, + trailing '\0'. */
 + dest = xmalloc (4 * strlen (*src) + 1);
 +
 + for (i = j = 0; (*src)[i] != '\0'; ++i) {
 + /*
 +  * We allow any non-control character, because LINE TABULATION
 +  *  friends can't do more harm than SPACE.  And someone
 +  * somewhere might be using these, so unless we actually can't
 +  * protect against spoofing attacks, we don't pretend we can.
 +  *
 +  * Note that '\n' is included both in the isspace() *and*
 +  * iscntrl() range.
 +  */
 + if (isprint((*src)[i]) || isspace((*src)[i])) {

This lets '\r' thru, not good.  BTW, (*src)[i] is quite a cypher.

 + dest[j++] = (*src)[i];
 + } else {
 + dest[j++] = '\\';
 + dest[j++] = '0' + (((*src)[i]  0xff)  6);
 + dest[j++] = '0' + (((*src)[i]  0x3f)  3);
 + dest[j++] = '0' + ((*src)[i]  7);
 + }
 + }
 + dest[j] = '\0';
 +
 + xfree (*src);
 + *src = dest;
 +}


Attached is version 2, which solves these problems.

Please keep me CC'd.

Jan.

-- 
   To me, clowns aren't funny. In fact, they're kind of scary. I've wondered
 where this started and I think it goes back to the time I went to the circus,
  and a clown killed my dad.
--- wget-1.9.1.ORIG/src/log.c   2004-08-22 13:42:33.0 +0200
+++ wget-1.9.1-jan/src/log.c2004-08-24 02:38:38.0 +0200
@@ -42,6 +42,12 @@
 # endif
 #endif /* not WGET_USE_STDARG */
 
+/* vasprintf() requires _GNU_SOURCE.  Which is OK with Debian. */
+/* This *must* be defined before stdio.h is included. */
+#ifndef _GNU_SOURCE
+# define _GNU_SOURCE
+#endif
+
 #include stdio.h
 #ifdef HAVE_STRING_H
 # include string.h
@@ -63,6 +69,8 @@
 #include wget.h
 #include utils.h
 
+#include ctype.h
+
 #ifndef errno
 extern int errno;
 #endif
@@ -345,7 +353,69 @@
   int expected_size;
   int allocated;
 };
+
+/* XXX Where does the declaration belong?? */
+void escape_buffer (char **src);
 
+/*
+ * escape_buffer  -- escape using '\NNN'.  To be used wherever we want to print
+ * untrusted data.
+ *
+ * Syntax: escape_buffer (buf-to-escape);
+ */
+void escape_buffer (char **src)
+{
+   char *dest, c;
+   int i, j;
+
+   /* We encode each byte using at most 4 bytes, + trailing '\0'. */
+   dest = xmalloc (4 * strlen (*src) + 1);
+
+   for (i = j = 0; (c = (*src)[i]) != '\0'; ++i) {
+   /*
+* We allow any non-control character, because '\t'  friends
+* can't do more harm than SPACE.  And someone somewhere might
+* be using these, so unless we actually can protect against
+* spoofing attacks, we don't pretend it.
+*
+* Note that '\n' is included both in the isspace() *and*
+* iscntrl() range.
+*
+* We try not to allow '\r'  friends by using isblank()
+* instead of isspace().  Let's hope noone will complain about
+* '\v'  similar being filtered (the characters we may still
+* let thru can vary among locales, so there is not much we can
+* do about this *from within logvprintf()*.
+*/
+   if (c == '\r'  *(c + 1) == '\n') {
+   /*
+* I've spotted wget printing CRLF line terminators
+* while communicating with ftp://ftp.debian.org.  This
+* is a bug: wget should print whatever the platform
+* line terminator is (CR on Mac, CRLF on CP/M, LF on
+* Un*x, etc.)
+*
+* We work around this bug here by taking CRLF for a
+* line terminator.  A lone CR is still treated as a
+* control character.
+*/
+   i++;
+   dest[j++] = '\n';
+   } else if (isprint(c) || isblank(c) || c == '\n') {
+   dest[j++] = c;
+  

Re: Bug#261755: Control sequences injection patch

2004-08-22 Thread Jan Minar
tags 261755 +patch
thanks

On Sun, Aug 22, 2004 at 11:39:07AM +0200, Thomas Hood wrote:
 The changes contemplated look very invasive.  How quickly can this
 bug be fixed?

Here we go:  Hacky, non-portable, but pretty slick  non-invasive,
whatever that means.  Now I'm going to check whether it is going to
catch all the cases where malicious characters could be possibly
injected.

This patch (hopefully) solves the problem of remote attacker (server or
otherwise) injects malicious control sequences in the HTTP headers.  It
by no mean solves the spoofing bug, which is by nature tricky to address
well.

Cheers,
Jan.

-- 
   To me, clowns aren't funny. In fact, they're kind of scary. I've wondered
 where this started and I think it goes back to the time I went to the circus,
  and a clown killed my dad.
--- wget-1.9.1.WORK/debian/changelog2004-08-22 19:34:16.0 +0200
+++ wget-1.9.1-jan/debian/changelog 2004-08-22 19:39:48.0 +0200
@@ -1,3 +1,12 @@
+wget (1.9.1-4.local-1) unstable; urgency=medium
+
+  * Local build
+  * Hopeless attempt to filter control chars in log output (see
+Bug#267393)
+  * This probably SHOULD make it in Sarge revision 0
+
+ -- Jan Min? [EMAIL PROTECTED]  Sun, 22 Aug 2004 19:39:02 +0200
+
 wget (1.9.1-4) unstable; urgency=low
 
   * made passive the default. sorry forgot again.:(
--- wget-1.9.1.WORK/src/log.c   2004-08-22 19:34:16.0 +0200
+++ wget-1.9.1-jan/src/log.c2004-08-22 19:31:33.0 +0200
@@ -63,6 +63,12 @@
 #include wget.h
 #include utils.h
 
+/* vasprintf() requires _GNU_SOURCE.  Which is OK with Debian. */
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include ctype.h
+
 #ifndef errno
 extern int errno;
 #endif
@@ -345,7 +351,49 @@
   int expected_size;
   int allocated;
 };
+
+/* XXX Where does the declaration belong?? */
+void escape_buffer (char **src);
 
+/*
+ * escape_untrusted  -- escape using '\NNN'.  To be used wherever we want to
+ * print untrusted data.
+ *
+ * Syntax: escape_buffer (buf-to-escape);
+ */
+void escape_buffer (char **src)
+{
+   char *dest;
+   int i, j;
+
+   /* We encode each byte using at most 4 bytes, + trailing '\0'. */
+   dest = xmalloc (4 * strlen (*src) + 1);
+
+   for (i = j = 0; (*src)[i] != '\0'; ++i) {
+   /*
+* We allow any non-control character, because LINE TABULATION
+*  friends can't do more harm than SPACE.  And someone
+* somewhere might be using these, so unless we actually can't
+* protect against spoofing attacks, we don't pretend we can.
+*
+* Note that '\n' is included both in the isspace() *and*
+* iscntrl() range.
+*/
+   if (isprint((*src)[i]) || isspace((*src)[i])) {
+   dest[j++] = (*src)[i];
+   } else {
+   dest[j++] = '\\';
+   dest[j++] = '0' + (((*src)[i]  0xff)  6);
+   dest[j++] = '0' + (((*src)[i]  0x3f)  3);
+   dest[j++] = '0' + ((*src)[i]  7);
+   }
+   }
+   dest[j] = '\0';
+
+   xfree (*src);
+   *src = dest;
+}
+
 /* Print a message to the log.  A copy of message will be saved to
saved_log, for later reusal by log_dump_context().
 
@@ -364,15 +412,28 @@
   int available_size = sizeof (smallmsg);
   int numwritten;
   FILE *fp = get_log_fp ();
+  char *buf;
+
+  /* int vasprintf(char **strp, const char *fmt, va_list ap); */
+  if (vasprintf (buf , fmt, args) == -1) {
+perror (_(Error));
+exit (1);
+  }
+
+  escape_buffer (buf);
 
   if (!save_context_p)
 {
   /* In the simple case just call vfprintf(), to avoid needless
  allocation and games with vsnprintf(). */
-  vfprintf (fp, fmt, args);
-  goto flush;
-}
 
+  /* vfprintf() didn't check return value, neither will we */
+  (void) fprintf(fp, %s, buf);
+}
+  else /* goto flush; */ /* There's no need to use goto here */
+/* This else-clause purposefully shifted 4 columns to the left, so that the
+ * diff is easy to read --Jan */
+{
   if (state-allocated != 0)
 {
   write_ptr = state-bigmsg;
@@ -384,8 +445,12 @@
  missing from legacy systems.  Therefore I consider it safe to
  assume that its return value is meaningful.  On the systems where
  vsnprintf() is not available, we use the implementation from
- snprintf.c which does return the correct value.  */
-  numwritten = vsnprintf (write_ptr, available_size, fmt, args);
+ snprintf.c which does return the correct value.
+ 
+ With snprintf(), this probably doesn't hold anymore.  But this is Debian,
+ so who cares. */
+
+  numwritten = snprintf (write_ptr, available_size, %s, buf);
 
   /* vsnprintf() will not step over the limit given by available_size.
  If it fails, it will return either -1 (POSIX?) or the number of
@@ -420,7 +485,7 @@

Re: Bug in wget 1.9.1 documentation

2004-07-12 Thread Hrvoje Niksic
Tristan Miller [EMAIL PROTECTED] writes:

 There appears to be a bug in the documentation (man page, etc.) for
 wget 1.9.1.

I think this is a bug in the man page generation process.



Re: [BUG] wget 1.9.1 and below can't download =2G file on 32bits system

2004-05-27 Thread Hrvoje Niksic
Yup; 1.9.1 cannot download large files.  I hope to fix this by the
next release.



Re: Bug report

2004-03-24 Thread Hrvoje Niksic
Juhana Sadeharju [EMAIL PROTECTED] writes:

 Command: wgetdir http://liarliar.sourceforge.net;.
 Problem: Files are named as
   content.php?content.2
   content.php?content.3
   content.php?content.4
 which are interpreted, e.g., by Nautilus as manual pages and are
 displayed as plain texts. Could the files and the links to them
 renamed as the following?
   content.php?content.2.html
   content.php?content.3.html
   content.php?content.4.html

Use the option `--html-extension' (-E).

 After all, are those pages still php files or generated html files?
 If they are html files produced by the php files, then it could be a
 good idea to add a new extension to the files.

They're the latter -- HTML files produced by the server-side PHP code.

 Command: wgetdir 
 http://www.newtek.com/products/lightwave/developer/lscript2.6/index.html;
 Problem: Images are not downloaded. Perhaps because the image links
 are the following:
   image src=v26_2.jpg

I've never seen this tag, but it seems to be the same as IMG.  Mozilla
seems to grok it and its DOM inspector thinks it has seen IMG.  Is
this tag documented anywhere?  Does IE understand it too?



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-05 Thread Hrvoje Niksic
D Richard Felker III [EMAIL PROTECTED] writes:

 The request log shows that the slashes are apparently respected.

 I retried a test case and found the same thing -- the slashes were
 respected.

OK.

 Then I remembered that I was using -i. Wget seems to work fine with
 the url on the command line; the bug only happens when the url is
 passed in with:

 cat EOF | wget -i -
 http://...
 EOF

But I cannot repeat that, either.  As long as the consecutive slashes
are in the query string, they're not stripped.

 Using this method is necessary since it is the ONLY secure way I
 know of to do a password-protected http request from a shell script.

Yes, that is the best way to do it.



Re: bug in use index.html

2004-03-04 Thread Hrvoje Niksic
The whole matter of conversion of / to /index.html on the file
system is a hack.  But I really don't know how to better represent
empty trailing file name on the file system.



Re: bug in use index.html

2004-03-04 Thread Dražen Kačar
Hrvoje Niksic wrote:
 The whole matter of conversion of / to /index.html on the file
 system is a hack.  But I really don't know how to better represent
 empty trailing file name on the file system.

Another, for now rather limited, hack: on file systems which support some
sort of file attributes you can mark index.html as an unwanted child of an
empty trailing file name. AFAIK, that should work at least on Solaris and
Linux. Others will join the club one day, I hope.

-- 
 .-.   .-.Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
 |
 |[EMAIL PROTECTED]


Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-04 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote:
   Removing the offending code fixes the problem, but I'm not sure if
   this is the correct solution. I expect it would be more correct to
   remove multiple slashes only before the first occurrance of ?, but
   not afterwards.
  
  That's exactly what should happen.  Please give us more details, if
  possible accompanied by `-d' output.
 
  If you'd still like details now that you know the version I was
  using, let me know and I'll be happy to do some tests.
 
 Yes please.  For example, this is how it works for me:
 
 $ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com;
 DEBUG output created by Wget 1.8.2 on linux-gnu.
 
 --19:23:02--  http://www.xemacs.org/something?redirect=http://www.cnn.com
= `something?redirect=http:%2F%2Fwww.cnn.com'
 Resolving www.xemacs.org... done.
 Caching www.xemacs.org = 199.184.165.136
 Connecting to www.xemacs.org[199.184.165.136]:80... connected.
 Created socket 3.
 Releasing 0x8080b40 (new refcount 1).
 ---request begin---
 GET /something?redirect=http://www.cnn.com HTTP/1.0
 User-Agent: Wget/1.8.2
 Host: www.xemacs.org
 Accept: */*
 Connection: Keep-Alive
 
 ---request end---
 HTTP request sent, awaiting response...
 ...
 
 The request log shows that the slashes are apparently respected.

I retried a test case and found the same thing -- the slashes were
respected. Then I remembered that I was using -i. Wget seems to work
fine with the url on the command line; the bug only happens when the
url is passed in with:

cat EOF | wget -i -
http://...
EOF

Using this method is necessary since it is the ONLY secure way I know
of to do a password-protected http request from a shell script.
Otherwise the password appears on the command line...

Rich



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread Hrvoje Niksic
D Richard Felker III [EMAIL PROTECTED] writes:

 The following code in url.c makes it impossible to request urls that
 contain multiple slashes in a row in their query string:
[...]

That code is removed in CVS, so multiple slashes now work correctly.

 Think of something like http://foo/bar/redirect.cgi?http://...
 wget translates this into: [...]

Which version of Wget are you using?  I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.

 Removing the offending code fixes the problem, but I'm not sure if
 this is the correct solution. I expect it would be more correct to
 remove multiple slashes only before the first occurrance of ?, but
 not afterwards.

That's exactly what should happen.  Please give us more details, if
possible accompanied by `-d' output.



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 03:36:55PM +0100, Hrvoje Niksic wrote:
 D Richard Felker III [EMAIL PROTECTED] writes:
 
  The following code in url.c makes it impossible to request urls that
  contain multiple slashes in a row in their query string:
 [...]
 
 That code is removed in CVS, so multiple slashes now work correctly.
 
  Think of something like http://foo/bar/redirect.cgi?http://...
  wget translates this into: [...]
 
 Which version of Wget are you using?  I think even Wget 1.8.2 didn't
 collapse multiple slashes in query strings, only in paths.

I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and
it persisted.

  Removing the offending code fixes the problem, but I'm not sure if
  this is the correct solution. I expect it would be more correct to
  remove multiple slashes only before the first occurrance of ?, but
  not afterwards.
 
 That's exactly what should happen.  Please give us more details, if
 possible accompanied by `-d' output.

If you'd still like details now that you know the version I was using,
let me know and I'll be happy to do some tests.

Rich



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread Hrvoje Niksic
D Richard Felker III [EMAIL PROTECTED] writes:

  Think of something like http://foo/bar/redirect.cgi?http://...
  wget translates this into: [...]
 
 Which version of Wget are you using?  I think even Wget 1.8.2 didn't
 collapse multiple slashes in query strings, only in paths.

 I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1
 and it persisted.

OK.

  Removing the offending code fixes the problem, but I'm not sure if
  this is the correct solution. I expect it would be more correct to
  remove multiple slashes only before the first occurrance of ?, but
  not afterwards.
 
 That's exactly what should happen.  Please give us more details, if
 possible accompanied by `-d' output.

 If you'd still like details now that you know the version I was
 using, let me know and I'll be happy to do some tests.

Yes please.  For example, this is how it works for me:

$ /usr/bin/wget -d http://www.xemacs.org/something?redirect=http://www.cnn.com;
DEBUG output created by Wget 1.8.2 on linux-gnu.

--19:23:02--  http://www.xemacs.org/something?redirect=http://www.cnn.com
   = `something?redirect=http:%2F%2Fwww.cnn.com'
Resolving www.xemacs.org... done.
Caching www.xemacs.org = 199.184.165.136
Connecting to www.xemacs.org[199.184.165.136]:80... connected.
Created socket 3.
Releasing 0x8080b40 (new refcount 1).
---request begin---
GET /something?redirect=http://www.cnn.com HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.xemacs.org
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
...

The request log shows that the slashes are apparently respected.



Re: bug in connect.c

2004-02-06 Thread Manfred Schwarb
Interesting.  Is it really necessary to zero out sockaddr/sockaddr_in
before using it?  I see that some sources do it, and some don't.  I
was always under the impression that, as long as you fill the relevant
members (sin_family, sin_addr, sin_port), other initialization is not
necessary.  Was I mistaken, or is this something specific to FreeBSD?
Do others have experience with this?


e.g. look at http://cvs.tartarus.org/putty/unix/uxnet.c

putty encountered the very same problem ...

regards
manfred


Re: bug in connect.c

2004-02-06 Thread Hrvoje Niksic
Manfred Schwarb [EMAIL PROTECTED] writes:

 Interesting.  Is it really necessary to zero out sockaddr/sockaddr_in
 before using it?  I see that some sources do it, and some don't.  I
 was always under the impression that, as long as you fill the relevant
 members (sin_family, sin_addr, sin_port), other initialization is not
 necessary.  Was I mistaken, or is this something specific to FreeBSD?

 Do others have experience with this?

 e.g. look at http://cvs.tartarus.org/putty/unix/uxnet.c

 putty encountered the very same problem ...

Amazing.  This obviously doesn't show up when binding to remote
addresses, or it would have been noticed ages ago.

Thanks for the pointer.  This patch should fix the problem in the CVS
version:

2004-02-06  Hrvoje Niksic  [EMAIL PROTECTED]

* connect.c (sockaddr_set_data): Zero out
sockaddr_in/sockaddr_in6.  Apparently BSD-derived stacks need this
when binding a socket to local address.

Index: src/connect.c
===
RCS file: /pack/anoncvs/wget/src/connect.c,v
retrieving revision 1.62
diff -u -r1.62 connect.c
--- src/connect.c   2003/12/12 14:14:53 1.62
+++ src/connect.c   2004/02/06 16:59:01
@@ -87,6 +87,7 @@
 case IPV4_ADDRESS:
   {
struct sockaddr_in *sin = (struct sockaddr_in *)sa;
+   xzero (*sin);
sin-sin_family = AF_INET;
sin-sin_port = htons (port);
sin-sin_addr = ADDRESS_IPV4_IN_ADDR (ip);
@@ -96,6 +97,7 @@
 case IPV6_ADDRESS:
   {
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sa;
+   xzero (*sin6);
sin6-sin6_family = AF_INET6;
sin6-sin6_port = htons (port);
sin6-sin6_addr = ADDRESS_IPV6_IN6_ADDR (ip);


Re: bug in connect.c

2004-02-04 Thread Hrvoje Niksic
francois eric [EMAIL PROTECTED] writes:

 after some test:
 bug is when: ftp, with username and password, with bind address specifyed
 bug is not when: http, ftp without username and password
 looks like memory leaks. so i made some modification before bind:
 src/connect.c:
 --
 ...
   /* Bind the client side to the requested address. */
   wget_sockaddr bsa;
 //!
   memset (bsa,0,sizeof(bsa));
 /!!
   wget_sockaddr_set_address (bsa, ip_default_family, 0, bind_address);
   if (bind (sock, bsa.sa, sockaddr_len ()))
 ..
 --
 after it all downloads become sucesfull.
 i think better do memset in wget_sockaddr_set_address, but it is for your
 choose.

Interesting.  Is it really necessary to zero out sockaddr/sockaddr_in
before using it?  I see that some sources do it, and some don't.  I
was always under the impression that, as long as you fill the relevant
members (sin_family, sin_addr, sin_port), other initialization is not
necessary.  Was I mistaken, or is this something specific to FreeBSD?

Do others have experience with this?



Re: Bug: Support of charcters like '\', '?', '*', ':' in URLs

2003-10-21 Thread Hrvoje Niksic
Frank Klemm [EMAIL PROTECTED] writes:

 Wget don't work properly when the URL contains characters which are
 not allowed in file names on the file system which is currently
 used. These are often '\', '?', '*' and ':'.

 Affected are at least:
 - Windows and related OS
 - Linux when using FAT or Samba as file system
[...]

Thanks for the report.  This has been fixed in Wget 1.9-beta.  It
doesn't use characters that FAT can't handle by default, and if you
use a mounted FAT filesystem, you can tell Wget to assume behavior as
if it were under Windows.



Re: bug in 1.8.2 with

2003-10-14 Thread Hrvoje Niksic
You're right -- that code was broken.  Thanks for the patch; I've now
applied it to CVS with the following ChangeLog entry:

2003-10-15  Philip Stadermann  [EMAIL PROTECTED]

* ftp.c (ftp_retrieve_glob): Correctly loop through the list whose
elements might have been deleted.




RE: Bug in Windows binary?

2003-10-06 Thread Herold Heiko
 From: Gisle Vanem [mailto:[EMAIL PROTECTED]

 Jens Rösner [EMAIL PROTECTED] said:
 
...
 
 I assume Heiko didn't notice it because he doesn't have that function
 in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP?
 
 --gv

Probably.
Currently I'm compiling and testing on NT 4.0 only.
Beside that I'm VERY tight on time in this moment so testing usually means
does it run ? Does it download one sample http and one https site ? Yes ?
Put it up for testing!.

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax


Re: Bug in Windows binary?

2003-10-05 Thread Gisle Vanem
Jens Rsner [EMAIL PROTECTED] said:

 I downloaded
 wget 1.9 beta 2003/09/29 from Heiko
 http://xoomer.virgilio.it/hherold/
...
 wget -d http://www.google.com
 DEBUG output created by Wget 1.9-beta on Windows.

 set_sleep_mode(): mode 0x8001, rc 0x8000

 I disabled my wgetrc as well and the output was exactly the same.

 I then tested
 wget 1.9 beta 2003/09/18 (earlier build!)
 from the same place and it works smoothly.

 Can anyone reproduce this bug?

Yes, but the MSVC version crashed on my machine.  But I've found
the cause caused by my recent change :(

A simple case of wrong calling-convention:

--- mswindows.c.org Mon Sep 29 11:46:06 2003
+++ mswindows.c Sun Oct 05 17:34:48 2003
@@ -306,7 +306,7 @@
 DWORD set_sleep_mode (DWORD mode)
 {
   HMODULE mod = LoadLibrary (kernel32.dll);
-  DWORD (*_SetThreadExecutionState) (DWORD) = NULL;
+  DWORD (WINAPI *_SetThreadExecutionState) (DWORD) = NULL;
   DWORD rc = (DWORD)-1;

I assume Heiko didn't notice it because he doesn't have that function
in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP?

--gv




Re: Bug in Windows binary?

2003-10-05 Thread Hrvoje Niksic
Gisle Vanem [EMAIL PROTECTED] writes:

 --- mswindows.c.org Mon Sep 29 11:46:06 2003
 +++ mswindows.c Sun Oct 05 17:34:48 2003
 @@ -306,7 +306,7 @@
  DWORD set_sleep_mode (DWORD mode)
  {
HMODULE mod = LoadLibrary (kernel32.dll);
 -  DWORD (*_SetThreadExecutionState) (DWORD) = NULL;
 +  DWORD (WINAPI *_SetThreadExecutionState) (DWORD) = NULL;
DWORD rc = (DWORD)-1;

 I assume Heiko didn't notice it because he doesn't have that
 function in his kernel32.dll. Heiko and Hrvoje, will you correct
 this ASAP?

I've now applied the patch, thanks.  I use the following ChangeLog
entry:

2003-10-05  Gisle Vanem  [EMAIL PROTECTED]

* mswindows.c (set_sleep_mode): Fix type of
_SetThreadExecutionState.



Re: BUG in --timeout (exit status)

2003-10-02 Thread Hrvoje Niksic
This problem is not specific to timeouts, but to recursive download (-r).

When downloading recursively, Wget expects some of the specified
downloads to fail and does not propagate that failure to the code that
sets the exit status.  This unfortunately includes the first download,
which should probably be an exception.


Re: BUG in --timeout (exit status)

2003-10-02 Thread Manfred Schwarb
OK, I see.
But I do not agree.
And I don't think it is a good idea to treat the first download special.

In my opinion, exit status 0 means everything during the whole 
retrieval went OK.
My prefered solution would be to set the final exit status to the highest
exit status of all individual downloads. Of course, retries which are 
triggered by --tries should erase the exit status of the previous attempt.
A non-zero exit status does not mean nothing went OK but some individual
downloads failed somehow.
And setting a non-zero exit status does not mean wget has to stop
retrieval immediately, it is OK to continue.

Again, wget's behaviour is not what the user expects.

And the user has always the possibility to make combinations of
--accept, --reject, --domains, etc. so in normal cases all 
individual downloads succeed, if he needs a exit status 0.
If he does not care about exit status, there is no problem at all,
of course...


regards
Manfred


Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 This problem is not specific to timeouts, but to recursive download (-r).
 
 When downloading recursively, Wget expects some of the specified
 downloads to fail and does not propagate that failure to the code that
 sets the exit status.  This unfortunately includes the first download,
 which should probably be an exception.
 




This message was sent using IMP, the Internet Messaging Program.


Re: bug maybe?

2003-09-23 Thread Hrvoje Niksic
Randy Paries [EMAIL PROTECTED] writes:

 Not sure if this is a bug or not.

I guess it could be called a bug, although it's no simple oversight.
Wget currently doesn't support large files.



RE: bug maybe?

2003-09-23 Thread Matt Pease
how do I get off this list?   I tried a few times before  
got no response from the server.

thank you-
Matt

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, September 23, 2003 8:53 PM
 To: Randy Paries
 Cc: [EMAIL PROTECTED]
 Subject: Re: bug maybe?
 
 
 Randy Paries [EMAIL PROTECTED] writes:
 
  Not sure if this is a bug or not.
 
 I guess it could be called a bug, although it's no simple oversight.
 Wget currently doesn't support large files.
 


  1   2   >