2011/7/22 Michael Bayer <mike...@zzzcomputing.com> > > > > Here comes the problem: when an Integrity errors occurs, a rollback is > done and every sqlalchemy records are expired, thus, the insert takes 3.42 > seconds... and 3.41 seconds are spent on the "restore snapshot" function ... > > well the reason for that is twofold: one is because some other concurrent > operation might have modified the data in between when your transaction is > removed and when the new one starts. Another is that the flush() process > may have modified the state in memory based on database state, that is now > invalid - that is, newly generated primary key identifiers have been > assigned to objects, foreign key attributes have been synchronized, column > level defaults. When a ROLLBACK occurs, all of those values are invalid. > Hence they're expired. Objects that you add()'ed are expunged - these may > have DB-generated defaults and identifiers that are now invalid. This is > all so that proceeding after the ROLLBACK, the Session is in a consistent > state.
The point is that i don't need some up to date objects : I could copy them into my own structures (if I had such) and delete them, the algorithm would still work. What I need is a "snapshot" of a group of objects at a given instant. If it was possible to have the session forget about the objects I retrieved, I think it would do the job. When the transaction accounting flag is turned off, the behavior of > continuing on the Session after a rollback with the same objects is > essentially undefined (even though it might work fine in your case). > until I do an expunge all i guess... > > > > > Setting the Session's argument "_enable_transaction_accounting" to False > solves the problem, ( the whole thing taking 0.012 s) but the "legacy" side > of this argument is problematic as it may disappear/not be maintained/Tested > in future versions of the library... > > its unlikely that flag will ever be removed but it does open you up to the > consistency problems mentioned above. The flag is underscored so that users > aren't confused by the question if they should be using it or not. (i.e. > if they are not sure, the answer is no). > My main problem would be that if i use this flag, i would need multiple kind of sessions that may complicate the code. > > > > Does anyone have a better/less ugly way to do this with comparable > performances ? > > I approach the "SELECT...THEN INSERT" issue by SELECTING the full expanse > of data I'm to work on in one big statement first, placing the results in a > dictionary or similar using keys tailored towards the operation in progress. > I then use the dictionary to consult for rows that already exist. Here comes the problems: I'm not the only user of the database so its state may have been updated inbetween. And I can't lock that much rows during the whole process either. I've written many big data movement operations that use this technique and > it is faster than the "insert/then explode/then try again" approach. > INSERTs can be batched, for one thing. The ORM will batch inserts into > single execute() calls to as much a degree as possible if you provide > primary key values yourself. Or session.execute() with an insert() and a > list of paramter sets will do it. > This will probably not work for me as I'm not sure of the database state. And if an integrity error pops in a batch of inserts, I would be unable to know which was wrong and which wasn't. > I also use window functions to "chunk" the above operation into groups of > rows, using a recipe like that at > http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery . > Thanks, this will be very useful as I was planning to implement a such function (due to my growing memory footprint). Still, I may need to implement my own comparison function (I need a particular sort) and it will be quite painful to try to be compatible with multiple databases. > > Basically I never do the "insert, then get an integrity error" case at all. > If I did, I'd at least be using begin_nested() - Postgresql in particular > doesn't let a transaction continue otherwise after an integrity error unless > a ROLLBACK is called. > Well each time I get an integrity error, i call the rollback method and i don't plan to continue the transaction or keep being in an unstable state. The only problem is that I need some minimum performances and the current rollback (or even commit if I had not set expire_on_commit to true) durations are not acceptable. Splitting the objects requests into smaller chunks may help, but I still feel that expiring objects that I does not need to update is a kind of waste. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.