Hi Aviv,

Since your bottleneck is fetching the urls, I suggest you look at using 
workerpool <http://code.google.com/p/workerpool/> with urllib3. It helps you 
do exactly what Michael describes. (Disclaimer: I wrote both, workerpool and 
urllib3. They were built to complement each other.)

There are even examples of how to use workerpool to download things in a 
multithreaded fashion:

http://code.google.com/p/workerpool/wiki/MassDownloader

Just substitute urllib with a urllib3 connection pool, and off you go. 
Experiment with different numbers of workers, depending on your server (5~15 
is usually a good number for high-throughput servers).

For other examples of how to do multithreaded IO stuff, have a look at 
s3funnel <http://code.google.com/p/s3funnel/> (another tool I wrote using 
workerpool). s3funnel uses a slightly more interesting setup with 
EquippedWorkers.

- Andrey

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to