if the URL fetch is an IO-bound operation (i.e. the time is spent waiting for 
IO), it might work if you did a standard consumer/producer model using 
Queue.Queue.   One thread retrieves data from each URL and places the datasets 
into the Queue.  the other thread pulls off items and loads them into the DB.   
 

Or the same idea, using the multiprocessing module instead of threading if the 
GIL is still getting in the way.  Or using Celery.   Maybe a deferred approach 
like that of Twisted. There's lots of ways to offload slow IO operations while 
work continues.

On Apr 20, 2011, at 8:33 PM, Aviv Giladi wrote:

> Thank you for your responses everyone.
> I have one more question - the really time heavy task here is
> retrieving the URLs over HTTP (it takes almost a second per URL).
> I am using urllib3 that has connection pooling, but other than that,
> is there any other way to speed this up? Perhaps multi-threading?
> 
> On Apr 20, 3:26 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>> my practices with this kind of situation are:
>> 
>> 1. theres just one commit() at the end.  I'd like the whole operation in one 
>> transaction
>> 2. There are flush() calls every 100-1000 or so.  10 is very low.
>> 3. I frequently will disable autoflush, if there are many flushes occurring 
>> due to queries for related data as the bulk proceeds.
>> 4. I dont use try/except to find duplicates - this invalidates the 
>> transaction (SQLAlchemy does this but many DBs force it anyway).   I use a 
>> SELECT to get things ahead of time, preferably loading the entire database 
>> worth of keys into a set, or loading the keys that I know we're dealing 
>> with, so that individual per-key SELECTs are not needed.    Or if the set of 
>> data I'm working with is the whole thing at once, I store the keys in a set 
>> as I get them, then I know which one's I've got as I go along.
>> 5. if i really need to do try/except, use savepoints, i.e. begin_nested().
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to