On Jul 22, 2011, at 1:21 PM, Moch Ramis wrote:

> SQLAlchemy 7.1
> ----------------------------
> 
> I'm currently using sqlalchemy with a lot of begin/inserts/commits that 
> allows me to make sure to have an atomic, stable save of my actions (but I 
> only need a snapshot of my object at one time: my requests are consistent 
> enough to avoid any database changes problems). The whole thing run pretty 
> fast as I set the expire_on_commit argument of the sessions to False (and I 
> don't need my objects to be updated).
> 
> Also, I'm using some database integrity constraint violation to avoid doing 
> some useless (and costly) selects.

> Here comes the problem: when an Integrity errors occurs, a rollback is done 
> and every sqlalchemy records are expired, thus, the insert takes 3.42 
> seconds... and 3.41 seconds are spent on the "restore snapshot" function ...

well the reason for that is twofold:  one is because some other concurrent 
operation might have modified the data in between when your transaction is 
removed and when the new one starts.  Another is that the flush() process may 
have modified the state in memory based on database state, that is now invalid 
- that is, newly generated primary key identifiers have been assigned to 
objects, foreign key attributes have been synchronized, column level defaults.  
When a ROLLBACK occurs, all of those values are invalid.  Hence they're 
expired.   Objects that you add()'ed are expunged - these may have DB-generated 
defaults and identifiers that are now invalid.   This is all so that proceeding 
after the ROLLBACK, the Session is in a consistent state.    When the 
transaction accounting flag is turned off, the behavior of continuing on the 
Session after a rollback with the same objects is essentially undefined (even 
though it might work fine in your case). 

> 
> Setting the Session's argument "_enable_transaction_accounting" to False 
> solves the problem, ( the whole thing taking 0.012 s) but the "legacy" side 
> of this argument is problematic as it may disappear/not be maintained/Tested 
> in future versions of the library...

its unlikely that flag will ever be removed but it does open you up to the 
consistency problems mentioned above.  The flag is underscored so that users 
aren't confused by the question if they should be using it or not.   (i.e. if 
they are not sure, the answer is no).

> 
> Does anyone have a better/less ugly way to do this with comparable 
> performances ?

I approach the "SELECT...THEN INSERT" issue by SELECTING the full expanse of 
data I'm to work on in one big statement first, placing the results in a 
dictionary or similar using keys tailored towards the operation in progress.  I 
then use the dictionary to consult for rows that already exist.    I've written 
many big data movement operations that use this technique and it is faster than 
the "insert/then explode/then try again" approach.   INSERTs can be batched, 
for one thing.   The ORM will batch inserts into single execute() calls to as 
much a degree as possible if you provide primary key values yourself.  Or 
session.execute() with an insert() and a list of paramter sets will do it.

I also use window functions to "chunk" the above operation into groups of rows, 
using a recipe like that at 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery .

Basically I never do the "insert, then get an integrity error" case at all.   
If I did, I'd at least be using begin_nested() - Postgresql in particular 
doesn't let a transaction continue otherwise after an integrity error unless a 
ROLLBACK is called.     

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to