phil curb wrote:

> I am downloading a page -r -l 1, so downloading URLs
> on that page, and some of them are like this
> 
> http://www.theregister.co.uk/content/4/23517.html 
> 
> if I try to download it with wget, I get a 404. Which
> is probably technically correct, the URL probably does
> not exist.
> 
> But a browser when I go to that URL, redirects me. 
> I was told it is a server end, probably ASP thing,
> where given that wrong URL, ASP code  - server side -
> generates the page.
> 
> It redirects me to
> 
> http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
> bi/ which is probably
> http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
> bi/index.html
> 
> wget can get that, but the html page with all the
> URLs, does not use that URL, and wget seems to not be
> able to download it.
> 

it seems that wget in cygwin does download it. As does the wget that
linux users are using.

it is the windows port of wget, that you get from google wget interlog,
that does not work with it.

somebody suggested , man wget, send a fake user agent header (since
browsers are getting it). But I doubt that is it.


The working one returns
stuff like
$ wget http://www.theregister.co.uk/content/4/23517.html
--02:37:20--  http://www.theregister.co.uk/content/4/23517.html
,,,,
HTTP request sent, awaiting response... 301 Moved Permanently
Location:
http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
bi/ [following]
...
02:37:21 (952.75 KB/s) - `index.html' saved [32688]



The windows port, wget interlog one, returned
......
Connecting to www.theregister.co.uk:80... connected!
HTTP request sent, awaiting response... 404 Not Found
02:32:09 ERROR 404: Not Found.



I guess the windows port doesn`t deal with 301 "error" or something

Reply via email to