2011/7/22 Michael Bayer <mike...@zzzcomputing.com>

>
>
> > Here comes the problem: when an Integrity errors occurs, a rollback is
> done and every sqlalchemy records are expired, thus, the insert takes 3.42
> seconds... and 3.41 seconds are spent on the "restore snapshot" function ...
>
> well the reason for that is twofold:  one is because some other concurrent
> operation might have modified the data in between when your transaction is
> removed and when the new one starts.  Another is that the flush() process
> may have modified the state in memory based on database state, that is now
> invalid - that is, newly generated primary key identifiers have been
> assigned to objects, foreign key attributes have been synchronized, column
> level defaults.  When a ROLLBACK occurs, all of those values are invalid.
>  Hence they're expired.   Objects that you add()'ed are expunged - these may
> have DB-generated defaults and identifiers that are now invalid.   This is
> all so that proceeding after the ROLLBACK, the Session is in a consistent
> state.


The point is that i don't need some up to date objects : I could copy them
into my own structures (if I had such) and delete them, the algorithm would
still work. What I need is a "snapshot" of a group of objects at a given
instant. If it was possible to have the session forget about the objects I
retrieved, I think it would do the job.


When the transaction accounting flag is turned off, the behavior of
> continuing on the Session after a rollback with the same objects is
> essentially undefined (even though it might work fine in your case).
>

until I do an expunge all i guess...


>
> >
> > Setting the Session's argument "_enable_transaction_accounting" to False
> solves the problem, ( the whole thing taking 0.012 s) but the "legacy" side
> of this argument is problematic as it may disappear/not be maintained/Tested
> in future versions of the library...
>
> its unlikely that flag will ever be removed but it does open you up to the
> consistency problems mentioned above.  The flag is underscored so that users
> aren't confused by the question if they should be using it or not.   (i.e.
> if they are not sure, the answer is no).
>

My main problem would be that if i use this flag, i would need multiple kind
of sessions that may complicate the code.


> >
> > Does anyone have a better/less ugly way to do this with comparable
> performances ?
>
> I approach the "SELECT...THEN INSERT" issue by SELECTING the full expanse
> of data I'm to work on in one big statement first, placing the results in a
> dictionary or similar using keys tailored towards the operation in progress.
>  I then use the dictionary to consult for rows that already exist.



Here comes the problems: I'm not the only user of the database so its state
may have been updated inbetween. And I can't lock that much rows during the
whole process either.


 I've written many big data movement operations that use this technique and
> it is faster than the "insert/then explode/then try again" approach.
> INSERTs can be batched, for one thing.   The ORM will batch inserts into
> single execute() calls to as much a degree as possible if you provide
> primary key values yourself.  Or session.execute() with an insert() and a
> list of paramter sets will do it.
>

This will probably not work for me as I'm not sure of the database state.
And if an integrity error pops in a batch of inserts, I would  be unable to
know which was wrong and which wasn't.


> I also use window functions to "chunk" the above operation into groups of
> rows, using a recipe like that at
> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery .
>

Thanks, this will be very useful as I was planning to implement a such
function (due to my growing memory footprint). Still, I may need to
implement my own comparison function (I need a particular sort) and it will
be quite painful to try to be compatible with multiple databases.


>
> Basically I never do the "insert, then get an integrity error" case at all.
>   If I did, I'd at least be using begin_nested() - Postgresql in particular
> doesn't let a transaction continue otherwise after an integrity error unless
> a ROLLBACK is called.
>

Well each time I get an integrity error, i call the rollback method and i
don't plan to continue the transaction or keep being in an unstable state.
The only problem is that I need some minimum performances and the current
rollback (or even commit if I had not set expire_on_commit to true)
durations are not acceptable. Splitting the objects requests into smaller
chunks may help, but I still feel that expiring objects that I does not need
to update is a kind of waste.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to