Re: urllib2 and threading

2009-05-01 Thread Aahz
In article , robean wrote: > >Here's the problem: the script simply crashes after getting a a couple >of urls and takes a long time to run (slower that a non-threaded >version that I wrote and ran). Can anyone figure out what I am doing >wrong? I am new to both threading and urllib2, so its possi

Re: urllib2 and threading

2009-05-01 Thread Piet van Oostrum
> robean (R) wrote: >R> def get_info_from_url(url): >R> """ A dummy version of the function simply visits urls and prints >R> the url of the page. """ >R> try: >R> page = urllib2.urlopen(url) >R> except urllib2.URLError, e: >R> print " error ", e.reason >R> except urll

Re: urllib2 and threading

2009-05-01 Thread shailen . tuli
For better performance, lxml easily outperforms Beautiful Soup. For what its worth, the code runs fine if you switch from urllib2 to urllib (different exceptions are raised, obviously). I have no experience using urllib2 in a threaded environment, so I'm not sure why it breaks; urllib does OK, tho

Re: urllib2 and threading

2009-05-01 Thread Stefan Behnel
robean wrote: > I am writing a program that involves visiting several hundred webpages > and extracting specific information from the contents. I've written a > modest 'test' example here that uses a multi-threaded approach to > reach the urls with urllib2. The actual program will involve fairly >

Re: urllib2 and threading

2009-05-01 Thread robean
Thanks for your reply. Obviously you make several good points about Beautiful Soup and Queue. But here's the problem: even if I do nothing whatsoever with the threads beyond just visiting the urls with urllib2, the program chokes. If I replace else: ulock.acquire() print page.geturl() #

Re: urllib2 and threading

2009-05-01 Thread Paul Rubin
robean writes: > reach the urls with urllib2. The actual program will involve fairly > elaborate scraping and parsing (I'm using Beautiful Soup for that) but > the example shown here is simplified and just confirms the url of the > site visited. Keep in mind Beautiful Soup is pretty slow, so if y

urllib2 and threading

2009-04-30 Thread robean
I am writing a program that involves visiting several hundred webpages and extracting specific information from the contents. I've written a modest 'test' example here that uses a multi-threaded approach to reach the urls with urllib2. The actual program will involve fairly elaborate scraping and p