not getting a non-copyrighted book from google books

coucinha palavrante Sun, 26 Mar 2006 13:18:43 -0800
Today I tried to download a copyright-free book from google books,

http://books.google.com/books?ie=UTF-8&vid=OCLC12252663&id=LR8E7T-pomAC&num=100&dq=intitle:swine&lpg=PA3&pg=PP1&printsec=2
, but consequently got a 
"HTTP request sent, awaiting response... 403 Forbidden"

from them.

I opened the page in the browser (Mozilla Firefox), checked the 'Page
Info' and saw in the General tab:

Name		Content
robots.txt	noarchive

so I thought maybe setting 'robots = off' in the .wgetrc would be a

solution. It wasn't.
Also, while the book was opened in the browser, I included a link to the
cookies.txt file in the wget command, but also this didn't work as you
can see:

$ wget --load-cookies
/home/<user>/.mozilla/firefox/j6tp5xi9.default/cookies.txt -pnp

"http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc"

--13:58:02-- 
http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc

           =>
`books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc'
Resolving books.google.com... 
72.14.203.133
Connecting to books.google.com|72.14.203.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
13:58:03 ERROR 403: Forbidden.
FINISHED --13:58:03--
Downloaded: 0 bytes in 0 files



Then I tried including a referer in the command, but this doesn't work
either:

$ wget --load-cookies
/home/ario/.mozilla/firefox/j6tp5xi9.default/cookies.txt --referer
"
http://books.google.com/books?ie=UTF-8&vid=OCLC12252663&id=LR8E7T-pomAC&num=100&dq=intitle:swine&lpg=PA3&pg=PA227&printsec=2" "
http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc"
--14:01:13-- 

http://books.google.com/books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc
           =>
`books?ie=UTF-8&id=LR8E7T-pomAC&q=intitle:swine&pg=PA227&img=1&zoom=3&sig=o11ksUM51x54geLI15_sAVgm6Tc'

Resolving books.google.com... 72.14.203.133
Connecting to books.google.com|72.14.203.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
14:01:14 ERROR 403: Forbidden.


So, why is wget not able to get those pages, while Mozilla is?

thanks,
arie
not getting a non-copyrighted book from google books

Reply via email to