On 1/29/11, Steven D'Aprano <st...@pearwood.info> wrote: > A few more comments... > > Alex Hall wrote: >> Hello, >> I am continuing to work on that api wrapper... I never realized how >> little I know about urllib/urllib2! The idea of downloading from the >> api is pretty easy: give it a url and a password and it gives you the >> book. Here is a quote from the api documentation: >> In addition the MD5 hash of the end user password must be passed in >> the request via a "X-password" HTTP header. > > You might like to mention where this API comes from. Sorry. http://api.bookshare.org. > > >> Here is what I am doing. I use hashlib.md5(password).hexdigest() to >> get the md5 of the password. "base" is just the base url, and >> "destination" is just a local path. If it matters, this is an https >> url. > > It may matter. urllib has some problems with https. Wonderful... Time to find another package? > > What makes you think you should use the *hex* digest of the password, > rather than some other format? Honestly, it seemed the logical choice, and the api docs to not say anything except to md5Sum() the password. I have tried it with and without the hexdigest() and nothing changed. I will look to see what else hashlib provides. > > >> user=urllib.quote(user) #user is an email address, so make it useable in >> a url >> req=urllib2.Request(base+"download/for/"+user+"/content/"+str(id), >> None, {"X-password":password}) >> try: >> book=urllib2.urlopen(req) >> local=open(destination+str(id), "w") #name the file > > You should open binary files in binary. This may not matter, depending > on your OS, but it never hurts to use "rb" and "wb" even when it doesn't > matter. Great point! > >> local.write(book.read()) #save the blob to the local file >> local.close() >> except urllib2.HTTPError, e: >> print "HTTP error "+str(e.code) >> except urllib2.URLError, e: >> print "URL error: "+e.reason > > > There is absolutely no point in catching an exception, only to print it. True. Currently, I am trying to get this to work. Once it does I will better my error-handling code. Still, I suppose the traceback would help even more... > You should only catch exceptions if you intended to *do something* other > than print the error message which would have been printed anyway. > > In this case, there is good useful information in the HTTP exception, > but not in the URL error. I recommend you change your code to: > > book = urllib2.urlopen(req) > local = open(destination+str(id), "wb") #name the file > try: > local.write(book.read()) #save the blob to the local file > except urllib2.HTTPError, e: > print "HTTP error:", > print e.code # 403 = permission denied, 401= not found, etc. > print e.msg # this may give you a clue why the request was rejected > # uncomment the next line if you need more info > # print e.hdrs > finally: > local.close() Makes sense. > > If any other exception, including URLError, happens, Python will > automatically print the traceback, including the exception. > > But other than these quibbles, the code looks fine to me. > > >> I keep getting an error 403, which the api defines as a bad login >> attempt. I am sure my password is right, though, so while I >> investigate, I thought I would check that I am not only going about >> this http header thing right but also getting the binary object right. >> I am following an example I found pretty closely. > > The HTTP standard is that error 403 is request forbidden. This > *strongly* suggests that either your username or password is wrong. Could this be due to the wrong encoding, as you mentioned above? What about that urllib.quote(user) for an email address? > > Or perhaps there are restrictions on how many times you can connect in a > day, and you've exceeded it. Or your account has been closed. Or the > website doesn't like the tool you are using to connect (Python). Or > you've tried downloading too many files too quickly, and the webserver > has locked you out. I will change the useragent. The api says that each api key is limited to three requests per second, no hourly or daily limits. > > My suggestion is: > > * Double check, *triple* check, that your username and password > are correct. I am as sure as I can be about the plaintext, the encoding of the md5 and the urllib.quote() may be causing problems. > > * Write out the URL by hand (you can use Python for calculating > the MD5 sum, I'm not that cruel *grins*). The url should be right. I am now at an error 500 instead of 403, which is rather strange. I know 500=internal server error, but as far as I know the api is not down. > > * Try using another commandline tool. If you're on Linux, you can > use curl or wget: > > wget --header="X-password:<PASSWORD>" <URL> > > with <PASSWORD> and <URL> replaced by the correct values. > > curl will probably be similar. Windows... > > * If wget works, great, go back to trying it from Python! If > not, inspect the error messages it prints. Try changing the > user-agent. Try setting the referer [sic] to the website's > home page. > > > -- > Steven > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor >
-- Have a great day, Alex (msg sent from GMail website) mehg...@gmail.com; http://www.facebook.com/mehgcap _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor