Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-04 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote:
> >> > Removing the offending code fixes the problem, but I'm not sure if
> >> > this is the correct solution. I expect it would be more correct to
> >> > remove multiple slashes only before the first occurrance of ?, but
> >> > not afterwards.
> >> 
> >> That's exactly what should happen.  Please give us more details, if
> >> possible accompanied by `-d' output.
> >
> > If you'd still like details now that you know the version I was
> > using, let me know and I'll be happy to do some tests.
> 
> Yes please.  For example, this is how it works for me:
> 
> $ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com";
> DEBUG output created by Wget 1.8.2 on linux-gnu.
> 
> --19:23:02--  http://www.xemacs.org/something?redirect=http://www.cnn.com
>=> `something?redirect=http:%2F%2Fwww.cnn.com'
> Resolving www.xemacs.org... done.
> Caching www.xemacs.org => 199.184.165.136
> Connecting to www.xemacs.org[199.184.165.136]:80... connected.
> Created socket 3.
> Releasing 0x8080b40 (new refcount 1).
> ---request begin---
> GET /something?redirect=http://www.cnn.com HTTP/1.0
> User-Agent: Wget/1.8.2
> Host: www.xemacs.org
> Accept: */*
> Connection: Keep-Alive
> 
> ---request end---
> HTTP request sent, awaiting response...
> ...
> 
> The request log shows that the slashes are apparently respected.

I retried a test case and found the same thing -- the slashes were
respected. Then I remembered that I was using -i. Wget seems to work
fine with the url on the command line; the bug only happens when the
url is passed in with:

cat <

Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 03:36:55PM +0100, Hrvoje Niksic wrote:
> D Richard Felker III <[EMAIL PROTECTED]> writes:
> 
> > The following code in url.c makes it impossible to request urls that
> > contain multiple slashes in a row in their query string:
> [...]
> 
> That code is removed in CVS, so multiple slashes now work correctly.
> 
> > Think of something like http://foo/bar/redirect.cgi?http://...
> > wget translates this into: [...]
> 
> Which version of Wget are you using?  I think even Wget 1.8.2 didn't
> collapse multiple slashes in query strings, only in paths.

I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and
it persisted.

> > Removing the offending code fixes the problem, but I'm not sure if
> > this is the correct solution. I expect it would be more correct to
> > remove multiple slashes only before the first occurrance of ?, but
> > not afterwards.
> 
> That's exactly what should happen.  Please give us more details, if
> possible accompanied by `-d' output.

If you'd still like details now that you know the version I was using,
let me know and I'll be happy to do some tests.

Rich



Bug in wget: cannot request urls with double-slash in the query string

2004-02-29 Thread D Richard Felker III
The following code in url.c makes it impossible to request urls that
contain multiple slashes in a row in their query string:

else if (*h == '/')
{
  /* Ignore empty path elements.  Supporting them well is hard
 (where do you save "http://x.com///y.html";?), and they
 don't bring any practical gain.  Plus, they break our
 filesystem-influenced assumptions: allowing them would
 make "x/y//../z" simplify to "x/y/z", whereas most people
 would expect "x/z".  */
  ++h;
}

Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into:

http://foo/bar/redirect.cgi?http:/...

and then the web server of course gives an error. Note that the
problem occurs even if the slashes were url escaped, since wget
unescapes them.

Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but not
afterwards.

Rich