Thanks very much for the response... lots to chew on here. well pretty much being saavy about expiration is the primary approach. The > rows you affect via an execute(), if they've been loaded in the session > they'd need to be expired from memory.
I understand this somewhat and had done that, but am still confused by one thing: in my pure ORM case (for "joe"), how did the identity map get a hold of the auto-increment ids when there was no apparent emitted SQL? It seems there wasn't one done, although I suppose it could have been done behind the scenes and not logged? Either that or the insert statements are returning the ids somehow... although I didn't see anything indicating that. Whatever method the ORM uses to get the ids from the INSERT statements it emits is where I was wondering if I could get the data to stuff the identity-map without needing to expire anything and/or re-query. Unless you can be coarse grained about it and expire everything (as occurs > normally after a commit()), or large chunks of things, you'd need to work > out specific situations as needed. Of course the best scenario is when you > aren't intermingling ORM state with SQL-emitted UPDATE statements so much. > Query.update() and Query.delete() were introduced to help with this as they > can apply the same filter() criteria to the objects within the Session, > expiring those that match - it can evaluate simple SQL criterion in memory > for this purpose. I had been looking at Query.update and it looked promising and dug around for an insert equivalent, but there wasn't one so I ended up dropping down to the SQL expression language. > For example, with the ORM you can't really do a batch/bulk insert (that I > know of), but you *can* mix in some SQL expression language to achieve it. > Here is a complete example that shows this for the standard Users/Addresses > example: > http://pastebin.com/BArU6hci > > so there, your options are: > > 1. expire bob.addresses ( see below, it appears you weren't aware of this) > 2. use lazy="dynamic" on bob.addresses, so that SQL is emitted each time > the relationship is accessed. > I'm not sure how those apply to doing a bulk insert with the ORM... I must be missing something? "dynamic" isn't an option for me as I use the relation frequently... I just need to sync up after a few selected insert locations. For the expiry of bob.addresses I don't know what to say... I had some sort of weird/idiotic mental block as I read and re-read the docs on expire and the docstring and still didn't get it for some reason. I feel shame. Thanks for the pointer to obvious second argument. :( > Although the latter behaviour isn't really surprising, can the query be > avoided somehow? Is there a way to manually update the session/identity-map > with the info? > > You can put things straight into the identity map using Session.add(), > assuming the object has a database identity. Session.merge(, load=False) > does it too and is normally used to merge in state from a cache into the > Session where the target identities > may already be present. > Thanks - I'm not sure I 100% get it. I think it is simply that I can make an ORM object instance not associated with a session (I think you call this "detached"), directly assign the PK (if I know it), and then when I call Session.add() the ORM will trust me and use the provided PK for integrating the object into the identity-map. Is that right? If this is the case, I still have the unfortunate problem of not knowing the autoincrement PK values without a fresh query. And before it comes up, I do actually need autoincrement PKs versus a more "natural" candidate key in almost all cases. Reasons here include space/time efficient joining to many separate tables, as well as allow renaming of the "natural" candidate keys without changing of record identity (or the many FKs). I'm continually evaluating whether I can ditch autoinc sequence PKs and it does not make sense in most cases. But to get identity on the object and have it be "clean" as far as pending > changes you need to tinker. The identity key part needs to be via > instance_state(obj).key = > object_mapper(obj).identity_key_from_instance(obj), so you can see there > you're starting to dive into private APIs. The "fixed" attributes with no > history you can assign via set_committed_value: > http://www.sqlalchemy.org/docs/orm/session.html#sqlalchemy.orm.attributes.set_committed_value, > this is more of a public API as it's there to help people write custom > loader logic for special cases. There would be nothing controversial about > making a more "public" api to assign identity to an object but we don't > usually add things without a use case we see fit; otherwise people don't > know why such a method is there and if/when they should be using it. The > use cases also drive how the API would look, as there are several ways a > "put an object in the identity map" API could look here. > You can see this approach is starting to tinker with things, it will work > completely fine and the patterns there aren't changing anytime soon, but > this is not the usual level that people work at. The APIs in this regard > are not as well suited (or widely tested) towards the use case of > manipulating the Session directly to that fine a degree. That said we > already have "make_transient()" which is the opposite of > "set_my_identity()", so, not too controversial as long as we don't confuse > people with it. > set_committed_value looks very promising... I'll tinker with it. I want to stay away from deeper APIs for fear of it being brittle, as you say (great adjective). > In general, is there a better way to do what I'm doing? The example is > obviously a simplified one... my actual use case is batch inserting > thousands of records this way and I'd like to be able to keep the ORM and > avoid the extra query if I could. > > Wondering though why I've never had this use case. Batch install thousands > of records with straight INSERT, great. But then you need all of those > thousands in memory via an ORM state all at once ? Why's that ? Are > doing more "batch" work with all N-thousand items in the ORM sense ? > Usually if I write an installer, then its done. The ORM use case comes > later, loads in just a handful of those records, then works with them. If I > need to batch insert and work with the full set of objects in an ORM sense > simultaneously, I work with the ORM all the way through. In modern > SQLAlchemy this can be nearly as fast - just assign the primary key value > to your objects ahead of time so that no "post fetch of primary key" is > needed, and the ORM will batch the INSERT statements just as you're doing > with your example. Tons and tons of work has gone into getting the ORM to > be better at batch inserts, since I use it in this way myself quite a lot. > Those darn autoincrements foil me again here... I can't assign PKs ahead of time, Regarding the use case and other comments on why I need to do it, I'll hum a few bars. Basically I'm working on a relatively large application that has been built from the ground up to use SQLAlchemy and only SQLAlchemy to define and work with the database (although schema definition, ORM definitions, and business logic have been kept as separate as possible... while still using declarative, which may make you cringe). The use of the ORM has made some particular tricky areas a heck of a lot easier to develop than it would have been with direct SQL. eg: when I have a complicated query to do that includes a complex (but nicely organized) spiderweb of a set of 10 tables, I can do this really easily with the ORM, When I'm cruising the emitted SQL for optimizations I almost weep with joy that I didn't have to fabricate the queries in there by hand. As to why there is a mix of ORM and emitted SQL, it is really only an optimization thing, and not a small one. The large batch inserts are not an initial data setup, but are a result of the application requirements, and they occur frequently. All business logic is initially coded using the ORM, since it is *by far* the easiest way to get it coded up (recall the weeping?). The SQL expression language is only hacked in to address issues found during profiling, and so far this has *only* been a requirement for bulk inserts. For example, a recent case had a particular function configuration taking ~8.5s to complete the operation. Checking the emitted SQL showed huge floods of the ORM doing several different batch inserts one insert at a time. Hacking those code segments to do batch inserts with the expression language in a way similar to my joe/bob example cut the query down to < 0.5s, even with inefficient identity-map refreshing. Fiddling with some eager loading config will carve that further down due to some remaining selects that could be merged, but the main point was the resolution of the "whoah that is a heck of a lot of unnecessary inserts" problem. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/2_MPlhEaXNIJ. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.