I do a lot of processing on large amount of data. 

The common pattern we follow is: 

1. Iterate through a large data set 
2. Do some sort of processing (i.e. NLP processing like tokenization, 
capitalization, regex parsing, ... ) 
3. Insert the new result in another table. 

Right now we are doing something like this: 

for x in session.query(Foo).yield_per(10000): 
  bar = Bar()
  bar.hello = x.world.lower()
  session.add(bar)
  session.flush()
session.commit()

This works, not great though. Typically, we will have to wait 30mins - 1hr 
to see `bar`s being committed. 

My question is: Is there a way that we can commit as we are iterating 
without breaking yield_per? 

If not, what is the recommended way of doing this? 

**NOTE: There's a lot of answers on Stackoverflow that involves writing 
custom pagination functions for session.query. Their efficiency and 
effectiveness has not been benchmarked. 


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to