bug? different behavior of wget and lwp-request (GET)

2003-12-17 Thread Diego Puppin
Dear all,

I found that some pages give a different reply, when queried with
different tools. For instance:

wget http://groups.yahoo.com/group/sammydavisjr/message/56

retrieves a re-direction header (HTTP 302). Then, wget follows the
redirection. On the other side

GET http://groups.yahoo.com/group/sammydavisjr/message/56

retrieves a standard page (HTTP 200).
Is this a bug (of GET, wget?) or a feature?

I realized this problem when testing two different Java program to
download pages from a URL. One uses a Java socket, the other uses Java
URLConnection. Well, **even if the request and the headers sent are the
same - byte-by-byte **, the two program gets different results. In
particular, the raw socket communication behaves like wget, the other
using Java URLConnection, behaves like GET.

What do you think?
Thank you, guys
Diego




RE: POST trouble

2003-12-17 Thread Herold Heiko
Works ok.
windows MSVC binary at http://xoomer.virgilio.it/hherold/
Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 16, 2003 9:41 PM
 To: Herold Heiko
 Cc: List Wget (E-mail); [EMAIL PROTECTED]
 Subject: Re: POST trouble
 
 
 Herold Heiko [EMAIL PROTECTED] writes:
 
  Content-Length: Content-Length: 35
 [...]
  The line
  Content-Length: Content-Length: 35
  certainly seems strange.
 
 Yup, that's where the bug is.  This should fix it:
 
 2003-12-16  Hrvoje Niksic  [EMAIL PROTECTED]
 
   * http.c (gethttp): Fix generation of `Content-Length'.
 
 Index: src/http.c
 ===
 RCS file: /pack/anoncvs/wget/src/http.c,v
 retrieving revision 1.137
 diff -u -r1.137 http.c
 --- src/http.c2003/12/12 22:55:19 1.137
 +++ src/http.c2003/12/16 20:39:30
 @@ -1253,8 +1253,7 @@
   }
   }
request_set_header (req, Content-Length,
 -   aprintf (Content-Length: %ld, 
 post_data_size),
 -   rel_value);
 +   aprintf (%ld, post_data_size), rel_value);
  }
  
/* Add the user headers. */
 


why is wget modifying contents of the file?

2003-12-17 Thread Marco Correia

hi,

I'm trying to download this file 
http://www.cotonete.iol.pt/radio_playlist.asp?audio_sub_type_id=31
but wget is modifying paths inside this file. For example, entries such as 

REF HREF=http://212.113.174.198/96Kbps/Chico Buarque - A Noite Dos 
Mascarados.wma /

is modifyied to

REF HREF=http://212.113.174.198/Chico Buarque - A Noite Dos Mascarados.wma 
/

I've already tryied a number of options, but perhaps I'm missed the correct 
ones...

Thanks
Marco

PS: I'm not in the list, so please CC me...

-- 
Marco Correia [EMAIL PROTECTED]



Re: why is wget modifying contents of the file?

2003-12-17 Thread Hrvoje Niksic
What options are you using to download the file?  As far as I'm aware,
Wget will not touch the contents of the files it downloads by default.


Re: why is wget modifying contents of the file?

2003-12-17 Thread Marco Correia
On Wednesday 17 December 2003 14:59, you wrote:
 What options are you using to download the file?  As far as I'm aware,
 Wget will not touch the contents of the files it downloads by default.

I've tryied with no options at all and with several options, with the same 
result. You can test it yourself, just go the mentioned url and choose view 
source with your browser. Then compare it with the file you get when using 
wget, and you'll see differences in the url entries.

thanks
Marco
-- 
Marco Correia [EMAIL PROTECTED]



Re: why is wget modifying contents of the file?

2003-12-17 Thread Hrvoje Niksic
Marco Correia [EMAIL PROTECTED] writes:

 On Wednesday 17 December 2003 14:59, you wrote:
 What options are you using to download the file?  As far as I'm
 aware, Wget will not touch the contents of the files it downloads
 by default.

 I've tryied with no options at all and with several options, with
 the same result.

With no options at all, Wget will simply download the URL you gave it
-- no modifications, no recursive downloads.

 You can test it yourself, just go the mentioned url and choose view
 source with your browser. Then compare it with the file you get
 when using wget, and you'll see differences in the url entries.

That doesn't necessarily mean that Wget changed the file.  The server
can deliver one thing to a browser, and another to Wget, possibly
making the distinction based on the contents of `User-Agent' and
`Referer' headers.  Some sites intentionally do this kind of thing,
blocking or crippling Wget (and other non-browser HTTP clients) to
prevent leeching.  Modifying `User-Agent' and/or `Referer' header
fields takes care of 99% of such protections.

See the documentation of `--user-agent' and `--referer' options;
experimenting with them might help you.


Re: why is wget modifying contents of the file?

2003-12-17 Thread Marco Correia
 That doesn't necessarily mean that Wget changed the file.  The server
 can deliver one thing to a browser, and another to Wget, possibly
 making the distinction based on the contents of `User-Agent' and
 `Referer' headers.  Some sites intentionally do this kind of thing,
 blocking or crippling Wget (and other non-browser HTTP clients) to
 prevent leeching.  Modifying `User-Agent' and/or `Referer' header
 fields takes care of 99% of such protections.

 See the documentation of `--user-agent' and `--referer' options;
 experimenting with them might help you.

no luck, doing this
wget -U Mozilla/5.0 --referer=http://www.cotonete.iol.pt/ 
http://www.cotonete.iol.pt/listen/radio_playlist.asp?audio_sub_type_id=31

also doen't work, and my browser is identifying itself as Mozilla/5.0

thanks anyway
Marco

-- 
Marco Correia [EMAIL PROTECTED]



Re: why is wget modifying contents of the file?

2003-12-17 Thread Hrvoje Niksic
Marco Correia [EMAIL PROTECTED] writes:

 no luck, doing this
 wget -U Mozilla/5.0 --referer=http://www.cotonete.iol.pt/ 
 http://www.cotonete.iol.pt/listen/radio_playlist.asp?audio_sub_type_id=31

 also doen't work, and my browser is identifying itself as
 Mozilla/5.0

Then I must assume that the server is distinguishing on another
feature of the HTTP request.

I'm positive that Wget does not modify the HTML if not explicitly told
to do so.  For example, use the `-S' flag and see that the resulting
HTML file corresponds to the `Content-Length' sent by the server.



subscribe aaron lynch

2003-12-17 Thread Aaron Lynch