Re: [sqlalchemy] How to keep ORM Sessions in sync when using the SQL expression language as well?

Russ Wed, 19 Oct 2011 22:49:29 -0700

Thanks very much for the response... lots to chew on here.

well pretty much being saavy about expiration is the primary approach.   The 
> rows you affect via an execute(), if they've been loaded in the session 
> they'd need to be expired from memory.



I understand this somewhat and had done that, but am still confused by one 
thing: in my pure ORM case (for "joe"), how did the identity map get a hold 
of the auto-increment ids when there was no apparent emitted SQL?  It seems 
there wasn't one done, although I suppose it could have been done behind the 
scenes and not logged?  Either that or the insert statements are returning 
the ids somehow... although I didn't see anything indicating that.  Whatever 
method the ORM uses to get the ids from the INSERT statements it emits is 
where I was wondering if I could get the data to stuff the identity-map 
without needing to expire anything and/or re-query.

Unless you can be coarse grained about it and expire everything (as occurs 
> normally after a commit()), or large chunks of things, you'd need to work 
> out specific situations as needed.  Of course the best scenario is when you 
> aren't intermingling ORM state with SQL-emitted UPDATE statements so much. 
>  Query.update() and Query.delete() were introduced to help with this as they 
> can apply the same filter() criteria to the objects within the Session, 
> expiring those that match - it can evaluate simple SQL criterion in memory 
> for this purpose.


I had been looking at Query.update and it looked promising and dug around 
for an insert equivalent, but there wasn't one so I ended up dropping down 
to the SQL expression language.

> For example, with the ORM you can't really do a batch/bulk insert (that I 
> know of), but you *can* mix in some SQL expression language to achieve it. 
>  Here is a complete example that shows this for the standard Users/Addresses 
> example:
> http://pastebin.com/BArU6hci
>
> so there, your options are:
>
> 1. expire bob.addresses ( see below, it appears you weren't aware of this)
> 2. use lazy="dynamic" on bob.addresses, so that SQL is emitted each time 
> the relationship is accessed.
>

I'm not sure how those apply to doing a bulk insert with the ORM... I must 
be missing something?  "dynamic" isn't an option for me as I use the 
relation frequently... I just need to sync up after a few selected insert 
locations.  For the expiry of bob.addresses I don't know what to say... I 
had some sort of weird/idiotic mental block as I read and re-read the docs 
on expire and the docstring and still didn't get it for some reason.  I feel 
shame.  Thanks for the pointer to obvious second argument. :(

> Although the latter behaviour isn't really surprising, can the query be 
> avoided somehow?  Is there a way to manually update the session/identity-map 
> with the info?
>
> You can put things straight into the identity map using Session.add(), 
> assuming the object has a database identity.   Session.merge(, load=False) 
> does it too and is normally used to merge in state from a cache into the 
> Session where the target identities 
>
may already be present.
>

Thanks - I'm not sure I 100% get it.  I think it is simply that I can make 
an ORM object instance not associated with a session (I think you call this 
"detached"), directly assign the PK (if I know it), and then when I call 
Session.add() the ORM will trust me and use the provided PK for integrating 
the object into the identity-map.  Is that right?

If this is the case, I still have the unfortunate problem of not knowing the 
autoincrement PK values without a fresh query.  And before it comes up, I do 
actually need autoincrement PKs versus a more "natural" candidate key in 
almost all cases.  Reasons here include space/time efficient joining to many 
separate tables, as well as allow renaming of the "natural" candidate keys 
without changing of record identity (or the many FKs).  I'm continually 
evaluating whether I can ditch autoinc sequence PKs and it does not make 
sense in most cases.

But to get identity on the object and have it be "clean" as far as pending 
> changes you need to tinker.  The identity key part needs to be via 
> instance_state(obj).key = 
> object_mapper(obj).identity_key_from_instance(obj), so you can see there 
> you're starting to dive into private APIs.   The "fixed" attributes with no 
> history you can assign via set_committed_value: 
> http://www.sqlalchemy.org/docs/orm/session.html#sqlalchemy.orm.attributes.set_committed_value,
>  this is more of a public API as it's there to help people write custom 
> loader logic for special cases.   There would be nothing controversial about 
> making a more "public" api to assign identity to an object but we don't 
> usually add things without a use case we see fit; otherwise people don't 
> know why such a method is there and if/when they should be using it.   The 
> use cases also drive how the API would look, as there are several ways a 
> "put an object in the identity map" API could look here.
>
 

You can see this approach is starting to tinker with things, it will work 
> completely fine and the patterns there aren't changing anytime soon, but 
> this is not the usual level that people work at.  The APIs in this regard 
> are not as well suited (or widely tested) towards the use case of 
> manipulating the Session directly to that fine a degree.  That said we 
> already have "make_transient()" which is the opposite of 
> "set_my_identity()", so, not too controversial as long as we don't confuse 
> people with it.
>

set_committed_value looks very promising... I'll tinker with it.  I want to 
stay away from deeper APIs for fear of it being brittle, as you say (great 
adjective).  

> In general, is there a better way to do what I'm doing?  The example is 
> obviously a simplified one... my actual use case is batch inserting 
> thousands of records this way and I'd like to be able to keep the ORM and 
> avoid the extra query if I could.
>
> Wondering though why I've never had this use case.  Batch install thousands 
> of records with straight INSERT, great. But then you need all of those 
> thousands in memory via an ORM state all at once ?   Why's that ?   Are 
> doing more "batch" work with all N-thousand items in the ORM sense ?   
>  Usually if I write an installer, then its done.   The ORM use case comes 
> later, loads in just a handful of those records, then works with them.  If I 
> need to batch insert and work with the full set of objects in an ORM sense 
> simultaneously, I work with the ORM all the way through.    In modern 
> SQLAlchemy this can be nearly as fast  - just assign the primary key value 
> to your objects ahead of time so that no "post fetch of primary key" is 
> needed, and the ORM will batch the INSERT statements just as you're doing 
> with your example.   Tons and tons of work has gone into getting the ORM to 
> be better at batch inserts, since I use it in this way myself quite a lot.
>

Those darn autoincrements foil me again here... I can't assign PKs ahead of 
time,

Regarding the use case and other comments on why I need to do it, I'll hum a 
few bars.  Basically I'm working on a relatively large application that has 
been built from the ground up to use SQLAlchemy and only SQLAlchemy to 
define and work with the database (although schema definition, ORM 
definitions, and business logic have been kept as separate as possible... 
while still using declarative, which may make you cringe).  The use of the 
ORM has made some particular tricky areas a heck of a lot easier to develop 
than it would have been with direct SQL.  eg: when I have a complicated 
query to do that includes a complex (but nicely organized) spiderweb of a 
set of 10 tables, I can do this really easily with the ORM,  When I'm 
cruising the emitted SQL for optimizations I almost weep with joy that I 
didn't have to fabricate the queries in there by hand.

As to why there is a mix of ORM and emitted SQL, it is really only an 
optimization thing, and not a small one.  The large batch inserts are not an 
initial data setup, but are a result of the application requirements, and 
they occur frequently.  All business logic is initially coded using the ORM, 
since it is *by far* the easiest way to get it coded up (recall the 
weeping?).  The SQL expression language is only hacked in to address issues 
found during profiling, and so far this has *only* been a requirement for 
bulk inserts.  For example, a recent case had a particular function 
configuration taking ~8.5s to complete the operation.  Checking the emitted 
SQL showed huge floods of the ORM doing several different batch inserts one 
insert at a time.  Hacking those code segments to do batch inserts with the 
expression language in a way similar to my joe/bob example cut the query 
down to < 0.5s, even with inefficient identity-map refreshing.  Fiddling 
with some eager loading config will carve that further down due to some 
remaining selects that could be merged, but the main point was the 
resolution of the "whoah that is a heck of a lot of unnecessary inserts" 
problem.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/2_MPlhEaXNIJ.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] How to keep ORM Sessions in sync when using the SQL expression language as well?

Reply via email to