On Sep 9, 2011, at 5:11 PM, Russ wrote: > when you say query(Class).filter(<criterion>).one(), that always emits SQL. > It's an open-ended query, and SQLalhcemy doesn't know that s2c1 is the object > which corresponds to the SQL you're about to emit. When the row comes back, > it then extracts the primary key from the incoming row, determines that > identity is already in the identity map and is unexpired, and the row is > skipped > > It ignores new data? I'm clearly going to need to be even more careful about > this when I expect 'read committed' behaviour from the database. > > (that's the disconnect here - incoming rows are not used if the existing > identity exists and is unexpired). > > Is there a global, or per session or something, way to change this > ignore-new-data behavior (aside from manually expiring objects prior to > querying for data?). Or even more brutal, can the identity map feature be > turned off altogether on a per session basis?
There's a method on query() called populate_existing(). This method was added back in the 0.3 days before we had the model of the "transaction" representing the scope of data, and we hadn't figured out the best boundary upon which data should be expired - people wanted to get at new data and this was what we gave them. It sets the "overwrite" flag for all rows received. Every row that comes in matching any existing identity will overwrite the object as if it were freshly loaded. Note this includes pending changes on the object, so usually autoflushing beforehand is a good idea (i.e. leave autoflush on). It's likely that this will solve the exact usage you're looking for. I can sense this whole thing is surprising to you, so I'd ask you consider the rationale behind how this works. The model is based off of that of other ORMs like Hibernate (they call it the "transaction level cache") and Storm. It's not trivial to go out and repeatedly fetch new data from the database for a graph of objects currently in memory. The assumption that the field of objects is "isolated" by default, using manual expiration as a hedge against specific cases where de-isolated communication is desired, is pretty standard. De-isolated communication between ongoing transactions is not the usual way most data operations proceed. > For those cases when 'read committed' and the associated > transaction-transaction contamination is actually the desired behavior, > having to manually tell SQLAlchemy to expire everything all the time so that > it doesn't ignore new incoming data is quite an extra layer to worry about on > top of the the already treacherous concurrency issues around 'read > committed'. I always knew the identity map was something to be wary of with > concurrency since it is effectively another layer of isolation on top of what > the DB is set up to do, but didn't consider the fact that new data that is > read would be ignored. > > From earlier you said: > > If a concurrent transaction were to come in from underneath and change "1" to > "222" while you were still in your ongoing operation, you might be pretty > upset > > But I shouldn't be! That is exactly how 'read committed' is supposed to > behave. > If I don't want that, I should be setting "repeatable read" or > "serializable". Although I guess it's true that most developers don't quite > know about transaction isolation levels (I sure didn't until fairly recently) > and might be upset with SQLAlchemy at the first layer, if it did happen to > them... OK but this is the basic notion of "optimistic" concurrency - most applications would *want* transactions to be SERIALIZABLE - that is, I can do anything I want in a transaction and its perfectly isolated. But then the application performs like crap. The lowering of concurrency levels is usually a tradeoff we make for performance reasons - not typically as an inter-transaction communication system as you are looking for here (though in some cases, as with counters, yes). The ORM defaults to the former assumption as the usual way of working. Features like "version id", a feature we got from Hibernate, exist as a means of optimistic concurrency control - when one wants changes to data to be "isolated", without the expense of locking that SERIALIZABLE incurs. SERIALIZABLE not only is expensive, it often is not even available such as for an app using MySQL MyISAM tables. I'd also wonder if a system that promises "exact mirroring of rows down to the transaction isolation behavior" would need to have completely constant communication with the database for every single operation, down to the access of an individual attribute. Only refreshing data when the user explicitly emits SELECT might be just as arbitrary as only refreshing data when COMMIT is emitted. I've always felt that the COMMIT boundary was a great place to do the expiration and in the old days I didn't even want it to do that (most apps are hindered by excessive SELECTs). -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.