bug? different behavior of wget and lwp-request (GET)
Dear all, I found that some pages give a different reply, when queried with different tools. For instance: wget http://groups.yahoo.com/group/sammydavisjr/message/56 retrieves a re-direction header (HTTP 302). Then, wget follows the redirection. On the other side GET http://groups.yahoo.com/group/sammydavisjr/message/56 retrieves a standard page (HTTP 200). Is this a bug (of GET, wget?) or a feature? I realized this problem when testing two different Java program to download pages from a URL. One uses a Java socket, the other uses Java URLConnection. Well, **even if the request and the headers sent are the same - byte-by-byte **, the two program gets different results. In particular, the raw socket communication behaves like wget, the other using Java URLConnection, behaves like GET. What do you think? Thank you, guys Diego
RE: POST trouble
Works ok. windows MSVC binary at http://xoomer.virgilio.it/hherold/ Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 16, 2003 9:41 PM To: Herold Heiko Cc: List Wget (E-mail); [EMAIL PROTECTED] Subject: Re: POST trouble Herold Heiko [EMAIL PROTECTED] writes: Content-Length: Content-Length: 35 [...] The line Content-Length: Content-Length: 35 certainly seems strange. Yup, that's where the bug is. This should fix it: 2003-12-16 Hrvoje Niksic [EMAIL PROTECTED] * http.c (gethttp): Fix generation of `Content-Length'. Index: src/http.c === RCS file: /pack/anoncvs/wget/src/http.c,v retrieving revision 1.137 diff -u -r1.137 http.c --- src/http.c2003/12/12 22:55:19 1.137 +++ src/http.c2003/12/16 20:39:30 @@ -1253,8 +1253,7 @@ } } request_set_header (req, Content-Length, - aprintf (Content-Length: %ld, post_data_size), - rel_value); + aprintf (%ld, post_data_size), rel_value); } /* Add the user headers. */
why is wget modifying contents of the file?
hi, I'm trying to download this file http://www.cotonete.iol.pt/radio_playlist.asp?audio_sub_type_id=31 but wget is modifying paths inside this file. For example, entries such as REF HREF=http://212.113.174.198/96Kbps/Chico Buarque - A Noite Dos Mascarados.wma / is modifyied to REF HREF=http://212.113.174.198/Chico Buarque - A Noite Dos Mascarados.wma / I've already tryied a number of options, but perhaps I'm missed the correct ones... Thanks Marco PS: I'm not in the list, so please CC me... -- Marco Correia [EMAIL PROTECTED]
Re: why is wget modifying contents of the file?
What options are you using to download the file? As far as I'm aware, Wget will not touch the contents of the files it downloads by default.
Re: why is wget modifying contents of the file?
On Wednesday 17 December 2003 14:59, you wrote: What options are you using to download the file? As far as I'm aware, Wget will not touch the contents of the files it downloads by default. I've tryied with no options at all and with several options, with the same result. You can test it yourself, just go the mentioned url and choose view source with your browser. Then compare it with the file you get when using wget, and you'll see differences in the url entries. thanks Marco -- Marco Correia [EMAIL PROTECTED]
Re: why is wget modifying contents of the file?
Marco Correia [EMAIL PROTECTED] writes: On Wednesday 17 December 2003 14:59, you wrote: What options are you using to download the file? As far as I'm aware, Wget will not touch the contents of the files it downloads by default. I've tryied with no options at all and with several options, with the same result. With no options at all, Wget will simply download the URL you gave it -- no modifications, no recursive downloads. You can test it yourself, just go the mentioned url and choose view source with your browser. Then compare it with the file you get when using wget, and you'll see differences in the url entries. That doesn't necessarily mean that Wget changed the file. The server can deliver one thing to a browser, and another to Wget, possibly making the distinction based on the contents of `User-Agent' and `Referer' headers. Some sites intentionally do this kind of thing, blocking or crippling Wget (and other non-browser HTTP clients) to prevent leeching. Modifying `User-Agent' and/or `Referer' header fields takes care of 99% of such protections. See the documentation of `--user-agent' and `--referer' options; experimenting with them might help you.
Re: why is wget modifying contents of the file?
That doesn't necessarily mean that Wget changed the file. The server can deliver one thing to a browser, and another to Wget, possibly making the distinction based on the contents of `User-Agent' and `Referer' headers. Some sites intentionally do this kind of thing, blocking or crippling Wget (and other non-browser HTTP clients) to prevent leeching. Modifying `User-Agent' and/or `Referer' header fields takes care of 99% of such protections. See the documentation of `--user-agent' and `--referer' options; experimenting with them might help you. no luck, doing this wget -U Mozilla/5.0 --referer=http://www.cotonete.iol.pt/ http://www.cotonete.iol.pt/listen/radio_playlist.asp?audio_sub_type_id=31 also doen't work, and my browser is identifying itself as Mozilla/5.0 thanks anyway Marco -- Marco Correia [EMAIL PROTECTED]
Re: why is wget modifying contents of the file?
Marco Correia [EMAIL PROTECTED] writes: no luck, doing this wget -U Mozilla/5.0 --referer=http://www.cotonete.iol.pt/ http://www.cotonete.iol.pt/listen/radio_playlist.asp?audio_sub_type_id=31 also doen't work, and my browser is identifying itself as Mozilla/5.0 Then I must assume that the server is distinguishing on another feature of the HTTP request. I'm positive that Wget does not modify the HTML if not explicitly told to do so. For example, use the `-S' flag and see that the resulting HTML file corresponds to the `Content-Length' sent by the server.