Re: Weird 302 problem with wget 1.7
John Levon <[EMAIL PROTECTED]> writes: > Thanks very much (wouldn't it be good to refer to the clause in the > RFC in the comments ?) Uh, I suppose so. But it doesn't matter that much -- someone looking for it will find it anyway. Besides, it's not clear which RFC Wget conforms to. Web standards are messy.
Re: Weird 302 problem with wget 1.7
On Mon, Jan 14, 2002 at 04:16:24PM +0100, Hrvoje Niksic wrote: > Ok, how about this patch: > > 2002-01-14 Hrvoje Niksic <[EMAIL PROTECTED]> > > * headers.c (header_get): Strip trailing whitespace from the > header. I've tested this patch, works great. Thanks very much (wouldn't it be good to refer to the clause in the RFC in the comments ?) regards john -- "Now why did you have to go and mess up the child's head, so you can get another gold waterbed ? You fake-hair contact-wearing liposuction carnival exhibit, listen to my rhyme ..."
Re: Weird 302 problem with wget 1.7
John Levon <[EMAIL PROTECTED]> writes: > The field-content does not include any leading or trailing LWS: >linear white space occurring before the first non-whitespace >character of the field-value or after the last non-whitespace >character of the field-value. Such leading or trailing LWS MAY be >removed without changing the semantics of the field value. Any LWS >that occurs between field-content MAY be replaced with a single SP >before interpreting the field value or forwarding the message >downstream. Ok, how about this patch: 2002-01-14 Hrvoje Niksic <[EMAIL PROTECTED]> * headers.c (header_get): Strip trailing whitespace from the header. Index: src/headers.c === RCS file: /pack/anoncvs/wget/src/headers.c,v retrieving revision 1.6 diff -u -r1.6 headers.c --- src/headers.c 2001/11/16 19:57:43 1.6 +++ src/headers.c 2002/01/14 15:13:31 @@ -64,8 +64,8 @@ as much memory as necessary for it to fit. It need not contain a `:', thus you can use it to retrieve, say, HTTP status line. - The trailing CRLF or LF are stripped from the header, and it is - zero-terminated. Is this well-behaved? */ + All trailing whitespace is stripped from the header, and it is + zero-terminated. */ int header_get (struct rbuf *rbuf, char **hdr, enum header_get_flags flags) { @@ -101,11 +101,13 @@ if (next == '\t' || next == ' ') continue; } - /* The header ends. */ + + /* Strip trailing whitespace. (*hdr)[i] is the newline; +decrement I until it points to the last available +whitespace. */ + while (i > 0 && ISSPACE ((*hdr)[i - 1])) + --i; (*hdr)[i] = '\0'; - /* Get rid of '\r'. */ - if (i > 0 && (*hdr)[i - 1] == '\r') - (*hdr)[i - 1] = '\0'; break; } }
Re: Weird 302 problem with wget 1.7
On Mon, Jan 14, 2002 at 02:30:54PM +0100, Hrvoje Niksic wrote: > > moz wget-1.7 188 wget http://www.movementarian.org/oprofile-0.0.8.tar.gz > > --20:35:51-- http://www.movementarian.org/oprofile-0.0.8.tar.gz > >=> `oprofile-0.0.8.tar.gz' > > Connecting to www.movementarian.org:80... connected! > > HTTP request sent, awaiting response... 302 Moved > > Location: http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz [following] > > --20:35:52-- http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz%20 > >=> `oprofile-0.0.8.tar.gz ' > > If you examine this log carefully, you'll notice that their `Location' > header contains a trailing space. Wget even reencodes the space as > %20 to make the URL more readable, but it still retrieves the "wrong" > URL. indeed. > Does someone else know if this is legal? I guess removing trailing > spaces from `Location' shouldn't be too harmful. Someone pointed out : http://www.ietf.org/rfc/rfc2616.txt 4.2 ... The field-content does not include any leading or trailing LWS: linear white space occurring before the first non-whitespace character of the field-value or after the last non-whitespace character of the field-value. Such leading or trailing LWS MAY be removed without changing the semantics of the field value. Any LWS that occurs between field-content MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream. So wget should always remove it IMHO regards john -- "Now why did you have to go and mess up the child's head, so you can get another gold waterbed ? You fake-hair contact-wearing liposuction carnival exhibit, listen to my rhyme ..."
Re: Weird 302 problem with wget 1.7
John Levon <[EMAIL PROTECTED]> writes: > moz wget-1.7 188 wget http://www.movementarian.org/oprofile-0.0.8.tar.gz > --20:35:51-- http://www.movementarian.org/oprofile-0.0.8.tar.gz >=> `oprofile-0.0.8.tar.gz' > Connecting to www.movementarian.org:80... connected! > HTTP request sent, awaiting response... 302 Moved > Location: http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz [following] > --20:35:52-- http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz%20 >=> `oprofile-0.0.8.tar.gz ' If you examine this log carefully, you'll notice that their `Location' header contains a trailing space. Wget even reencodes the space as %20 to make the URL more readable, but it still retrieves the "wrong" URL. Does someone else know if this is legal? I guess removing trailing spaces from `Location' shouldn't be too harmful.
Weird 302 problem with wget 1.7
moz wget-1.7 188 wget http://www.movementarian.org/oprofile-0.0.8.tar.gz --20:35:51-- http://www.movementarian.org/oprofile-0.0.8.tar.gz => `oprofile-0.0.8.tar.gz' Connecting to www.movementarian.org:80... connected! HTTP request sent, awaiting response... 302 Moved Location: http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz [following] --20:35:52-- http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz%20 => `oprofile-0.0.8.tar.gz ' Connecting to www.movement.uklinux.net:80... connected! HTTP request sent, awaiting response... 404 Not Found 20:35:52 ERROR 404: Not Found. Where is the space (%20) coming from ? Is it perhaps a bug with my domain registrar (www.123-reg.co.uk) ? This is wget 1.7 on RedHat Linux 7.0 I'm not subscribed to the list, please Cc: thanks john -- "I went to set up a Yahoo ID for my dog. (Don't ask, but the DOG'S email was cluttering my inbox)." - Ruthless Advisorette
Re: problem with wget 1.7
Arkadiusz Miskiewicz <[EMAIL PROTECTED]> writes: > please try: > wget --mirror http://www.ire.pw.edu.pl/zejim/rois/ Thanks for the report. I believe this patch should fix the problem. 2001-06-14 Hrvoje Niksic <[EMAIL PROTECTED]> * recur.c (recursive_retrieve): Also check undesirable_urls with canonicalized URL. Index: src/recur.c === RCS file: /pack/anoncvs/wget/src/recur.c,v retrieving revision 1.21 diff -u -r1.21 recur.c --- src/recur.c 2001/05/27 19:35:09 1.21 +++ src/recur.c 2001/06/14 21:43:21 @@ -381,7 +381,13 @@ } xfree (constr); constr = xstrdup (u->url); - string_set_add (undesirable_urls, constr); + /* After we have canonicalized the URL, check if we have it +on the black list. */ + if (string_set_contains (undesirable_urls, constr)) + inl = 1; + /* This line is bogus. */ + /*string_set_add (undesirable_urls, constr);*/ + if (!inl && !((u->proto == URLFTP) && !this_url_ftp)) if (!opt.spanhost && this_url && !same_host (this_url, constr)) {
problem with wget 1.7
please try: wget --mirror http://www.ire.pw.edu.pl/zejim/rois/ and wget loops ;( 1.6 version works fine. -- Arkadiusz MiĆkiewicz, AM2-6BONE, 1024/3DB19BBD http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ IPv6 ready PLD/Linux at http://www.pld.org.pl/