Re: Weird 302 problem with wget 1.7

2002-01-14 Thread Hrvoje Niksic

John Levon <[EMAIL PROTECTED]> writes:

> Thanks very much (wouldn't it be good to refer to the clause in the
> RFC in the comments ?)

Uh, I suppose so.  But it doesn't matter that much -- someone looking
for it will find it anyway.  Besides, it's not clear which RFC Wget
conforms to.  Web standards are messy.



Re: Weird 302 problem with wget 1.7

2002-01-14 Thread John Levon

On Mon, Jan 14, 2002 at 04:16:24PM +0100, Hrvoje Niksic wrote:

> Ok, how about this patch:
> 
> 2002-01-14  Hrvoje Niksic  <[EMAIL PROTECTED]>
> 
>   * headers.c (header_get): Strip trailing whitespace from the
>   header.

I've tested this patch, works great.

Thanks very much (wouldn't it be good to refer to the clause in the RFC
in the comments ?)

regards
john

-- 
"Now why did you have to go and mess up the child's head, so you can get another gold 
waterbed ?
 You fake-hair contact-wearing liposuction carnival exhibit, listen to my rhyme ..."



Re: Weird 302 problem with wget 1.7

2002-01-14 Thread Hrvoje Niksic

John Levon <[EMAIL PROTECTED]> writes:

> The field-content does not include any leading or trailing LWS:
>linear white space occurring before the first non-whitespace
>character of the field-value or after the last non-whitespace
>character of the field-value. Such leading or trailing LWS MAY be
>removed without changing the semantics of the field value. Any LWS
>that occurs between field-content MAY be replaced with a single SP
>before interpreting the field value or forwarding the message
>downstream.

Ok, how about this patch:

2002-01-14  Hrvoje Niksic  <[EMAIL PROTECTED]>

* headers.c (header_get): Strip trailing whitespace from the
header.

Index: src/headers.c
===
RCS file: /pack/anoncvs/wget/src/headers.c,v
retrieving revision 1.6
diff -u -r1.6 headers.c
--- src/headers.c   2001/11/16 19:57:43 1.6
+++ src/headers.c   2002/01/14 15:13:31
@@ -64,8 +64,8 @@
as much memory as necessary for it to fit.  It need not contain a
`:', thus you can use it to retrieve, say, HTTP status line.
 
-   The trailing CRLF or LF are stripped from the header, and it is
-   zero-terminated.    Is this well-behaved?  */
+   All trailing whitespace is stripped from the header, and it is
+   zero-terminated.  */
 int
 header_get (struct rbuf *rbuf, char **hdr, enum header_get_flags flags)
 {
@@ -101,11 +101,13 @@
  if (next == '\t' || next == ' ')
continue;
}
- /* The header ends.  */
+
+ /* Strip trailing whitespace.  (*hdr)[i] is the newline;
+decrement I until it points to the last available
+whitespace.  */
+ while (i > 0 && ISSPACE ((*hdr)[i - 1]))
+   --i;
  (*hdr)[i] = '\0';
- /* Get rid of '\r'.  */
- if (i > 0 && (*hdr)[i - 1] == '\r')
-   (*hdr)[i - 1] = '\0';
  break;
}
}



Re: Weird 302 problem with wget 1.7

2002-01-14 Thread John Levon

On Mon, Jan 14, 2002 at 02:30:54PM +0100, Hrvoje Niksic wrote:

> > moz wget-1.7 188 wget http://www.movementarian.org/oprofile-0.0.8.tar.gz
> > --20:35:51--  http://www.movementarian.org/oprofile-0.0.8.tar.gz
> >=> `oprofile-0.0.8.tar.gz'
> > Connecting to www.movementarian.org:80... connected!
> > HTTP request sent, awaiting response... 302 Moved
> > Location: http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz  [following]
> > --20:35:52--  http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz%20
> >=> `oprofile-0.0.8.tar.gz '
> 
> If you examine this log carefully, you'll notice that their `Location'
> header contains a trailing space.  Wget even reencodes the space as
> %20 to make the URL more readable, but it still retrieves the "wrong"
> URL.

indeed.

> Does someone else know if this is legal?  I guess removing trailing
> spaces from `Location' shouldn't be too harmful.

Someone pointed out :

http://www.ietf.org/rfc/rfc2616.txt

4.2

...

The field-content does not include any leading or trailing LWS:
   linear white space occurring before the first non-whitespace
   character of the field-value or after the last non-whitespace
   character of the field-value. Such leading or trailing LWS MAY be
   removed without changing the semantics of the field value. Any LWS
   that occurs between field-content MAY be replaced with a single SP
   before interpreting the field value or forwarding the message
   downstream.

So wget should always remove it IMHO

regards
john


-- 
"Now why did you have to go and mess up the child's head, so you can get another gold 
waterbed ?
 You fake-hair contact-wearing liposuction carnival exhibit, listen to my rhyme ..."



Re: Weird 302 problem with wget 1.7

2002-01-14 Thread Hrvoje Niksic

John Levon <[EMAIL PROTECTED]> writes:

> moz wget-1.7 188 wget http://www.movementarian.org/oprofile-0.0.8.tar.gz
> --20:35:51--  http://www.movementarian.org/oprofile-0.0.8.tar.gz
>=> `oprofile-0.0.8.tar.gz'
> Connecting to www.movementarian.org:80... connected!
> HTTP request sent, awaiting response... 302 Moved
> Location: http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz  [following]
> --20:35:52--  http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz%20
>=> `oprofile-0.0.8.tar.gz '

If you examine this log carefully, you'll notice that their `Location'
header contains a trailing space.  Wget even reencodes the space as
%20 to make the URL more readable, but it still retrieves the "wrong"
URL.

Does someone else know if this is legal?  I guess removing trailing
spaces from `Location' shouldn't be too harmful.



Weird 302 problem with wget 1.7

2002-01-08 Thread John Levon


moz wget-1.7 188 wget http://www.movementarian.org/oprofile-0.0.8.tar.gz
--20:35:51--  http://www.movementarian.org/oprofile-0.0.8.tar.gz
   => `oprofile-0.0.8.tar.gz'
Connecting to www.movementarian.org:80... connected!
HTTP request sent, awaiting response... 302 Moved
Location: http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz  [following]
--20:35:52--  http://www.movement.uklinux.net/oprofile-0.0.8.tar.gz%20
   => `oprofile-0.0.8.tar.gz '
Connecting to www.movement.uklinux.net:80... connected!
HTTP request sent, awaiting response... 404 Not Found
20:35:52 ERROR 404: Not Found.

Where is the space (%20) coming from ? Is it perhaps a bug with my domain
registrar (www.123-reg.co.uk) ?

This is wget 1.7 on RedHat Linux 7.0

I'm not subscribed to the list, please Cc:

thanks
john

-- 
"I went to set up a Yahoo ID for my dog. (Don't ask, but the DOG'S email was 
cluttering my inbox)." 
- Ruthless Advisorette



Re: problem with wget 1.7

2001-06-14 Thread Hrvoje Niksic

Arkadiusz Miskiewicz <[EMAIL PROTECTED]> writes:

> please try:
> wget --mirror http://www.ire.pw.edu.pl/zejim/rois/

Thanks for the report.  I believe this patch should fix the problem.

2001-06-14  Hrvoje Niksic  <[EMAIL PROTECTED]>

* recur.c (recursive_retrieve): Also check undesirable_urls with
canonicalized URL.

Index: src/recur.c
===
RCS file: /pack/anoncvs/wget/src/recur.c,v
retrieving revision 1.21
diff -u -r1.21 recur.c
--- src/recur.c 2001/05/27 19:35:09 1.21
+++ src/recur.c 2001/06/14 21:43:21
@@ -381,7 +381,13 @@
}
  xfree (constr);
  constr = xstrdup (u->url);
- string_set_add (undesirable_urls, constr);
+ /* After we have canonicalized the URL, check if we have it
+on the black list. */
+ if (string_set_contains (undesirable_urls, constr))
+   inl = 1;
+ /* This line is bogus. */
+ /*string_set_add (undesirable_urls, constr);*/
+
  if (!inl && !((u->proto == URLFTP) && !this_url_ftp))
if (!opt.spanhost && this_url && !same_host (this_url, constr))
  {



problem with wget 1.7

2001-06-06 Thread Arkadiusz Miskiewicz


please try:
wget --mirror http://www.ire.pw.edu.pl/zejim/rois/

and wget loops ;( 1.6 version works fine.

-- 
Arkadiusz Miƛkiewicz, AM2-6BONE, 1024/3DB19BBD
 http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/
IPv6 ready PLD/Linux at http://www.pld.org.pl/