Re: [sqlalchemy] How to keep ORM Sessions in sync when using the SQL expression language as well?

Michael Bayer Thu, 20 Oct 2011 07:00:46 -0700

On Oct 20, 2011, at 1:49 AM, Russ wrote:

> 
> I understand this somewhat and had done that, but am still confused by one 
> thing: in my pure ORM case (for "joe"), how did the identity map get a hold 
> of the auto-increment ids when there was no apparent emitted SQL?  It seems 
> there wasn't one done, although I suppose it could have been done behind the 
> scenes and not logged?


the examples you showed me on pastebin all seemed to involve the ORM going out 
to fetch rows, so when they are fetched via SELECT all the info is there.

> 
> I had been looking at Query.update and it looked promising and dug around for 
> an insert equivalent, but there wasn't one so I ended up dropping down to the 
> SQL expression language.

so really you're only concerned about a bulk INSERT, ok

> 
> 1. expire bob.addresses ( see below, it appears you weren't aware of this)
> 2. use lazy="dynamic" on bob.addresses, so that SQL is emitted each time the 
> relationship is accessed.
> 
> I'm not sure how those apply to doing a bulk insert with the ORM...

assuming you had "bob", then you did a mass-insert of Address objects, both of 
the above methods would allow it such that the collection of "addresses" on bob 
would indicate the current data

> 
> Thanks - I'm not sure I 100% get it.  I think it is simply that I can make an 
> ORM object instance not associated with a session (I think you call this 
> "detached"), directly assign the PK (if I know it), and then when I call 
> Session.add() the ORM will trust me and use the provided PK for integrating 
> the object into the identity-map.  Is that right?

Correct, but I've never had anyone do this before.    I'm not yet sure I like 
advising someone to manipulate the Session directly in this way.   We need to 
be careful what we advise because in two weeks it will show up on tutorials 
everywhere then we get complaints that "SQLAlchemy is too hard" because 
everyone thinks they're supposed to put things straight into the identity map 
(exaggerating here, but only slightly).

> If this is the case, I still have the unfortunate problem of not knowing the 
> autoincrement PK values without a fresh query.  

well therein lies why the ORM most of the time can't use bulk INSERT calls in a 
backend-neutral way.   Your approach, which essentially is that of swapping in 
some optimized code where flush() would normally take place, is subject to the 
same rules of the road.  Assuming the backend is one that uses a "get the most 
recent ID" approach, which includes SQLite, MySQL and MSSQL, the Python DBAPI, 
and in fact every database client API I've ever worked with, only supports 
returning you the "last inserted id" one INSERT at at time - that is, only via 
cursor.execute(), and not cursor.executemany()  (background on DBAPI at 
http://www.python.org/dev/peps/pep-0249/).   On Postgresql, Oracle, Firebird, 
the underlying construct to generate ids is the sequence - on those, you could 
run the sequence externally to the inserts for the full range of integer IDs 
first.


> 
> As to why there is a mix of ORM and emitted SQL, it is really only an 
> optimization thing, and not a small one.  The large batch inserts are not an 
> initial data setup, but are a result of the application requirements, and 
> they occur frequently.  All business logic is initially coded using the ORM, 
> since it is by far the easiest way to get it coded up (recall the weeping?).  
> The SQL expression language is only hacked in to address issues found during 
> profiling, and so far this has only been a requirement for bulk inserts.  For 
> example, a recent case had a particular function configuration taking ~8.5s 
> to complete the operation.  Checking the emitted SQL showed huge floods of 
> the ORM doing several different batch inserts one insert at a time.  Hacking 
> those code segments to do batch inserts with the expression language in a way 
> similar to my joe/bob example cut the query down to < 0.5s, even with 
> inefficient identity-map refreshing.  Fiddling with some eager loading config 
> will carve that further down due to some remaining selects that could be 
> merged, but the main point was the resolution of the "whoah that is a heck of 
> a lot of unnecessary inserts" problem.

so to do this, you'd need:

1. a way to make primary key ids fast, i.e. a sequence.   or not using autoinc 
if you're on MySQL, etc. 
2. a function that will copy those inserted rows into ORM objects, where you 
use set_committed_state() for those attributes you'd like loaded (must include 
at least the primary key identifying attributes)) and state.key = 
instance_key(obj) (paraphrased) to set up the state and use session.add().   At 
some point SQLA could grow a make_detached(obj) function.
2a:  Fine here's make_detached:

from sqlalchemy.orm.attributes import instance_state
def make_detached(obj):
    """Given a transient object with state, turn it into 'detached', that is, 
has an instance key
       based on its primary key identity.  obj must have primary key attributes 
populated."""
    state = instance_state(obj)
    state.key = state.mapper._identity_key_from_state(state)

   
3. really good practices to package this optimization cleanly and tidily away 
from everything else !


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] How to keep ORM Sessions in sync when using the SQL expression language as well?

Reply via email to