Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III <[EMAIL PROTECTED]> writes: >> The request log shows that the slashes are apparently respected. > > I retried a test case and found the same thing -- the slashes were > respected. OK. > Then I remembered that I was using -i. Wget seems to work fine with > the url on the command line; the bug only happens when the url is > passed in with: > > cat < http://... > EOF But I cannot repeat that, either. As long as the consecutive slashes are in the query string, they're not stripped. > Using this method is necessary since it is the ONLY secure way I > know of to do a password-protected http request from a shell script. Yes, that is the best way to do it.
Re: Bug in wget: cannot request urls with double-slash in the query string
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote: > >> > Removing the offending code fixes the problem, but I'm not sure if > >> > this is the correct solution. I expect it would be more correct to > >> > remove multiple slashes only before the first occurrance of ?, but > >> > not afterwards. > >> > >> That's exactly what should happen. Please give us more details, if > >> possible accompanied by `-d' output. > > > > If you'd still like details now that you know the version I was > > using, let me know and I'll be happy to do some tests. > > Yes please. For example, this is how it works for me: > > $ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com"; > DEBUG output created by Wget 1.8.2 on linux-gnu. > > --19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com >=> `something?redirect=http:%2F%2Fwww.cnn.com' > Resolving www.xemacs.org... done. > Caching www.xemacs.org => 199.184.165.136 > Connecting to www.xemacs.org[199.184.165.136]:80... connected. > Created socket 3. > Releasing 0x8080b40 (new refcount 1). > ---request begin--- > GET /something?redirect=http://www.cnn.com HTTP/1.0 > User-Agent: Wget/1.8.2 > Host: www.xemacs.org > Accept: */* > Connection: Keep-Alive > > ---request end--- > HTTP request sent, awaiting response... > ... > > The request log shows that the slashes are apparently respected. I retried a test case and found the same thing -- the slashes were respected. Then I remembered that I was using -i. Wget seems to work fine with the url on the command line; the bug only happens when the url is passed in with: cat <
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III <[EMAIL PROTECTED]> writes: >> > Think of something like http://foo/bar/redirect.cgi?http://... >> > wget translates this into: [...] >> >> Which version of Wget are you using? I think even Wget 1.8.2 didn't >> collapse multiple slashes in query strings, only in paths. > > I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 > and it persisted. OK. >> > Removing the offending code fixes the problem, but I'm not sure if >> > this is the correct solution. I expect it would be more correct to >> > remove multiple slashes only before the first occurrance of ?, but >> > not afterwards. >> >> That's exactly what should happen. Please give us more details, if >> possible accompanied by `-d' output. > > If you'd still like details now that you know the version I was > using, let me know and I'll be happy to do some tests. Yes please. For example, this is how it works for me: $ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com"; DEBUG output created by Wget 1.8.2 on linux-gnu. --19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com => `something?redirect=http:%2F%2Fwww.cnn.com' Resolving www.xemacs.org... done. Caching www.xemacs.org => 199.184.165.136 Connecting to www.xemacs.org[199.184.165.136]:80... connected. Created socket 3. Releasing 0x8080b40 (new refcount 1). ---request begin--- GET /something?redirect=http://www.cnn.com HTTP/1.0 User-Agent: Wget/1.8.2 Host: www.xemacs.org Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ... The request log shows that the slashes are apparently respected.
Re: Bug in wget: cannot request urls with double-slash in the query string
On Mon, Mar 01, 2004 at 03:36:55PM +0100, Hrvoje Niksic wrote: > D Richard Felker III <[EMAIL PROTECTED]> writes: > > > The following code in url.c makes it impossible to request urls that > > contain multiple slashes in a row in their query string: > [...] > > That code is removed in CVS, so multiple slashes now work correctly. > > > Think of something like http://foo/bar/redirect.cgi?http://... > > wget translates this into: [...] > > Which version of Wget are you using? I think even Wget 1.8.2 didn't > collapse multiple slashes in query strings, only in paths. I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and it persisted. > > Removing the offending code fixes the problem, but I'm not sure if > > this is the correct solution. I expect it would be more correct to > > remove multiple slashes only before the first occurrance of ?, but > > not afterwards. > > That's exactly what should happen. Please give us more details, if > possible accompanied by `-d' output. If you'd still like details now that you know the version I was using, let me know and I'll be happy to do some tests. Rich
Re: Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III <[EMAIL PROTECTED]> writes: > The following code in url.c makes it impossible to request urls that > contain multiple slashes in a row in their query string: [...] That code is removed in CVS, so multiple slashes now work correctly. > Think of something like http://foo/bar/redirect.cgi?http://... > wget translates this into: [...] Which version of Wget are you using? I think even Wget 1.8.2 didn't collapse multiple slashes in query strings, only in paths. > Removing the offending code fixes the problem, but I'm not sure if > this is the correct solution. I expect it would be more correct to > remove multiple slashes only before the first occurrance of ?, but > not afterwards. That's exactly what should happen. Please give us more details, if possible accompanied by `-d' output.
Bug in wget: cannot request urls with double-slash in the query string
The following code in url.c makes it impossible to request urls that contain multiple slashes in a row in their query string: else if (*h == '/') { /* Ignore empty path elements. Supporting them well is hard (where do you save "http://x.com///y.html";?), and they don't bring any practical gain. Plus, they break our filesystem-influenced assumptions: allowing them would make "x/y//../z" simplify to "x/y/z", whereas most people would expect "x/z". */ ++h; } Think of something like http://foo/bar/redirect.cgi?http://... wget translates this into: http://foo/bar/redirect.cgi?http:/... and then the web server of course gives an error. Note that the problem occurs even if the slashes were url escaped, since wget unescapes them. Removing the offending code fixes the problem, but I'm not sure if this is the correct solution. I expect it would be more correct to remove multiple slashes only before the first occurrance of ?, but not afterwards. Rich