On Sep 14, 2010, at 12:22 PM, Jon Siddle wrote: > Thank you for such a full elaboration. I still think the end result is > something a little unintuitive (albeit only for those using detached > objects, who will come across it); but I can't argue against your > decision here. Based on the information you've given, keeping > things the way they are is undoubtedly the decision that will benefit > most users. I've worked with ORM internals before, and I know only > too well that architectural decisions made for sound reasons can > nevertheless make some things difficult, which feel like they should > be simple. As always, the best you can do is maximise the value to > the most people. > > Perhaps just a note in the eagerload docs to say "...will not load > backreferences eagerly. If you are using detached objects, try the > join...contains_eager pattern." or words to that effect?
yeah definitely. I am psyched for the "immediateload" option, I think this option was not so easy to implement originally but the current architecture allows it to go in nicely with very little effort. > > Thanks > > Jon > > > > On 14/09/10 16:27, Michael Bayer wrote: >> >> On Sep 14, 6:03 am, Jon Siddle<j...@corefiling.co.uk> wrote: >>> I would agree with all of this if I understood why a) it takes an >>> appreciable number of function calls or b) "automatic decisionmaking" is >>> necessary. I don't think there's any ambiguity here, but again; perhaps >>> I'm missing something fundamental. >> Theres a "backref" with an attribute handler that receives "append" >> events in userland, and emits a corresponding "set" event on the other >> side. The Collection attribute implementation could receive extra >> code that would cause it to do something similar to the backref >> handler when the collection is populated. This is what you are >> looking for here. >> >> The append of the collection ultimately occurs in mapper.py line 2298 >> in the current tip. The function calling it, a closure called >> _instance() that is generated inside of _instance_process(), is always >> the top hit on any profiling run against a load of objects. >> Removing just three function calls from this method is a win. Months >> of work have gone into the workings of this method so that our load >> speeds remain competitive with, and sometimes even faster than, other >> ORMs that do not use identity maps or units of work, ORMs that are now >> written almost entirely in C (I leave the identification of these >> products as an exercise for the reader). >> >> So the implementation here would probably replace the list being >> appended onto with one that is additionally instrumented to catch >> these events and populate in the other direction. We already have the >> GenericBackrefExtension which does this. GBE is designed to work in >> userland, not during a load but when you populate things manually. >> GBE is handy, but is a primary place for bugs to occur, as it is an >> event handler that issues more events. I just fixed one the other day >> involving some rare endless loop where the events could not decide >> when to stop back-populating in each direction. I was pretty amazed >> that there were still some of those cases left after all these years. >> >> So GBE as it is wouldn't work here, we'd need to write something just >> like GBE but tailored towards loads, when there are no events firing >> off. It would be simpler than GBE, maybe ten lines total all >> inlined. But it would be a little bit redundant (now we have two back >> populators ! one for loads, one for userland events). If we OTOH >> tried to make this new reverse-populator and GBE somehow work on the >> same system, that's probably more complicated. At the very least it >> would add method overhead as GBE would now be delegating down a deeper >> chain. So this is already complexity that really would have to be >> worth it in order to dig into. >> >> We don't just now have two ways of backref population, we also have >> two different ways that Child.parent may be populated. It might be >> populated by the LoaderStrategy associated with Child.parent, or it >> might be populated by the one associated with Parent.children. If >> the query spans across both relationships, now both LoaderStrategy >> objects are there, and unless we do something about it, *both* will >> populate Child.parent - totally redundant effort, sloppy. Do we want >> to prevent that ? Sure, now we need more messages and flags running >> around trying to prevent that case - the internals become that more >> intricate. Why are they loading this attribute in two different ways >> when the first way is totally fine ? Who wrote this crap ? >> >> I know you don't believe me but I've been writing this thing for five >> years - everything gets much more hairy than you'd like. Even the >> immediateloader patch I made you the other day, it immediately failed >> when i started testing it further. I had to hack into that same >> _instance_process() method I told you about and modify it so that it >> could run certain attribute loaders after all the others. I cannot >> recall ever adding any feature, however small and innocuous, that did >> not have some unforeseen side effects that required further testing, >> further checks and decisions that we didn't realize would be >> needed. >> >> So with the effort of adding new code and possibly refactoring how >> backref events work, all the new tests needed to ensure it works >> acceptably, do we get increased performance ? No - performance now >> decreases for all users, unconditionally. All collection loads now >> get additional overhead whether the user ever needed to load the other >> side or not. >> >> Do we gain ease of use ? For the vast majority of users who work with >> the Session opened as is recommended, no. They continue get >> Child.parent for free no matter how the Parent and Child happened to >> be loaded, if they are both in memory. The new feature adds >> absolutely no enhancement to usage for them. >> >> Why did we do it then ? Well, some people want to work with detached >> objects, because they have special needs like serialization, caching, >> or they have an issue with templates working with proxy objects, and >> they want the ORM to support their use case without any further >> intervention or specification. Even though detached objects are by >> design more tedious to work with - they are not associated with a >> database connection, other attributes that didn't come in the load >> aren't available, the attributes attached to their loaded child >> objects aren't available, and in the real world, a Child object that >> wants to go into a cache or something would likely need many of its >> attributes pre-loaded, not just the one that happens to be associated >> with its owning collection. The feature then makes it one less step, >> *if* the one attribute they care about having pre-loaded is that >> one. Otherwise, even their experience is not enhanced at all by the >> new feature. >> >> How could I possibly justify such a design decision ? >> > > > -- > Jon Siddle, CoreFiling Limited > Software Tools Developer > http://www.corefiling.com > Phone: +44-1865-203192 > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To post to this group, send email to sqlalch...@googlegroups.com. > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.