Today I tried to download a copyright-free book from google books,
http://books.google.com/books?ie=UTF-8&vid=OCLC12252663&id=LR8E7T-pomAC&num=100&dq=intitle:swine&lpg=PA3&pg=PP1&printsec=2
, but consequently got a
"HTTP request sent, awaiting response... 403 Forbidden"
from them.

I opened the page in the browser (Mozilla Firefox), checked the 'Page
Info' and saw in the General tab:

Name Content
robots.txt noarchive

so I thought maybe setting 'robots = off' in the .wgetrc would be a
solution. It wasn't.
Also, while the book was opened in the browser, I included a link to the
cookies.txt file in the wget command, but also this didn't work as you
can see:

$ wget --load-cookies
/home/<user>/.mozilla/firefox/j6tp5xi9.default/cookies.txt -pnp
"http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc"
--13:58:02--
http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc
=>
`books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc'
Resolving books.google.com... 72.14.203.133
Connecting to books.google.com|72.14.203.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
13:58:03 ERROR 403: Forbidden.
FINISHED --13:58:03--
Downloaded: 0 bytes in 0 files


Then I tried including a referer in the command, but this doesn't work
either:

$ wget --load-cookies
/home/ario/.mozilla/firefox/j6tp5xi9.default/cookies.txt --referer
" http://books.google.com/books?ie=UTF-8&vid=OCLC12252663&id=LR8E7T-pomAC&num=100&dq=intitle:swine&lpg=PA3&pg=PA227&printsec=2" " http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc"
--14:01:13--
http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc
=>
`books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc'
Resolving books.google.com... 72.14.203.133
Connecting to books.google.com|72.14.203.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
14:01:14 ERROR 403: Forbidden.

So, why is wget not able to get those pages, while Mozilla is?

thanks,
arie


Reply via email to