Re: [sqlalchemy] Seemingly inconsistent results with transaction isolation

Michael Bayer Fri, 09 Sep 2011 15:06:33 -0700

On Sep 9, 2011, at 5:11 PM, Russ wrote:

> when you say query(Class).filter(<criterion>).one(), that always emits SQL.  
> It's an open-ended query, and SQLalhcemy doesn't know that s2c1 is the object 
> which corresponds to the SQL you're about to emit.   When the row comes back, 
> it then extracts the primary key from the incoming row, determines that 
> identity is already in the identity map and is unexpired, and the row is 
> skipped
> 
> It ignores new data?  I'm clearly going to need to be even more careful about 
> this when I expect 'read committed' behaviour from the database.
>  
> (that's the disconnect here - incoming rows are not used if the existing 
> identity exists and is unexpired).
> 
> Is there a global, or per session or something, way to change this 
> ignore-new-data behavior (aside from manually expiring objects prior to 
> querying for data?).  Or even more brutal, can the identity map feature be 
> turned off altogether on a per session basis?


There's a method on query() called populate_existing().  This method was added 
back in the 0.3 days before we had the model of the "transaction" representing 
the scope of data, and we hadn't figured out the best boundary upon which data 
should be expired - people wanted to get at new data and this was what we gave 
them.    It sets the "overwrite" flag for all rows received.   Every row that 
comes in matching any existing identity will overwrite the object as if it were 
freshly loaded.  Note this includes pending changes on the object, so usually 
autoflushing beforehand is a good idea (i.e. leave autoflush on).   It's likely 
that this will solve the exact usage you're looking for.

I can sense this whole thing is surprising to you, so I'd ask you consider the 
rationale behind how this works.    The model is based off of that of other 
ORMs like Hibernate (they call it the "transaction level cache") and Storm.   
It's not trivial to go out and repeatedly fetch new data from the database for 
a graph of objects currently in memory.  The assumption that the field of 
objects is "isolated" by default, using manual expiration as a hedge against 
specific cases where de-isolated communication is desired, is pretty standard.  
  De-isolated communication between ongoing transactions is not the usual way 
most data operations proceed.

> For those cases when 'read committed' and the associated 
> transaction-transaction contamination is actually the desired behavior, 
> having to manually tell SQLAlchemy to expire everything all the time so that 
> it doesn't ignore new incoming data is quite an extra layer to worry about on 
> top of the the already treacherous concurrency issues around 'read 
> committed'.  I always knew the identity map was something to be wary of with 
> concurrency since it is effectively another layer of isolation on top of what 
> the DB is set up to do, but didn't consider the fact that new data that is 
> read would be ignored.
> 
> From earlier you said:
>  
> If a concurrent transaction were to come in from underneath and change "1" to 
> "222" while you were still in your ongoing operation, you might be pretty 
> upset
> 
> But I shouldn't be!  That is exactly how 'read committed' is supposed to 
> behave.
>  If I don't want that, I should be setting "repeatable read" or 
> "serializable".   Although I guess it's true that most developers don't quite 
> know about transaction isolation levels (I sure didn't until fairly recently) 
> and might be upset with SQLAlchemy at the first layer, if it did happen to 
> them...

OK but this is the basic notion of "optimistic" concurrency - most applications 
would *want* transactions to be SERIALIZABLE - that is, I can do anything I 
want in a transaction and its perfectly isolated.  But then the application 
performs like crap.   The lowering of concurrency levels is usually a tradeoff 
we make for performance reasons - not typically as an inter-transaction 
communication system as you are looking for here (though in some cases, as with 
counters, yes).    The ORM defaults to the former assumption as the usual way 
of working.    Features like "version id", a feature we got from Hibernate, 
exist as a means of optimistic concurrency control - when one wants changes to 
data to be "isolated", without the expense of locking that SERIALIZABLE incurs. 
   SERIALIZABLE not only is expensive, it often is not even available such as 
for an app using MySQL MyISAM tables.   

I'd also wonder if a system that promises "exact mirroring of rows down to the 
transaction isolation behavior" would need to have completely constant 
communication with the database for every single operation, down to the access 
of an individual attribute.   Only refreshing data when the user explicitly 
emits SELECT might be just as arbitrary as only refreshing data when COMMIT is 
emitted.      I've always felt that the COMMIT boundary was a great place to do 
the expiration and in the old days I didn't even want it to do that (most apps 
are hindered by excessive SELECTs).


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Seemingly inconsistent results with transaction isolation

Reply via email to