On Sep 14, 2010, at 12:22 PM, Jon Siddle wrote:

> Thank you for such a full elaboration. I still think the end result is
> something a little unintuitive (albeit only for those using detached
> objects, who will come across it); but I can't argue against your
> decision here. Based on the information you've given, keeping
> things the way they are is undoubtedly the decision that will benefit
> most users. I've worked with ORM internals before, and I know only
> too well that architectural decisions made for sound reasons can
> nevertheless make some things difficult, which feel like they should
> be simple. As always, the best you can do is maximise the value to
> the most people.
> 
> Perhaps just a note in the eagerload docs to say "...will not load
> backreferences eagerly. If you are using detached objects, try the
> join...contains_eager pattern." or words to that effect?

yeah definitely.   I am psyched for the "immediateload" option, I think this 
option was not so easy to implement originally but the current architecture 
allows it to go in nicely with very little effort.



> 
> Thanks
> 
> Jon
> 
> 
> 
> On 14/09/10 16:27, Michael Bayer wrote:
>> 
>> On Sep 14, 6:03 am, Jon Siddle<j...@corefiling.co.uk>  wrote:
>>> I would agree with all of this if I understood why a) it takes an
>>> appreciable number of function calls or b) "automatic decisionmaking" is
>>> necessary. I don't think there's any ambiguity here, but again; perhaps
>>> I'm missing something fundamental.
>> Theres a "backref" with an attribute handler that receives "append"
>> events in userland, and emits a corresponding "set" event on the other
>> side.     The Collection attribute implementation could receive extra
>> code that would cause it to do something similar to the backref
>> handler when the collection is populated.    This is what you are
>> looking for here.
>> 
>> The append of the collection ultimately occurs in mapper.py line 2298
>> in the current tip.   The function calling it, a closure called
>> _instance() that is generated inside of _instance_process(), is always
>> the top hit on any profiling run against a load of objects.
>> Removing just three function calls from this method is a win.   Months
>> of work have gone into the workings of this method so that our load
>> speeds remain competitive with, and sometimes even faster than, other
>> ORMs that do not use identity maps or units of work, ORMs that are now
>> written almost entirely in C (I leave the identification of these
>> products as an exercise for the reader).
>> 
>> So the implementation here would probably replace the list being
>> appended onto with one that is additionally instrumented to catch
>> these events and populate in the other direction.  We already have the
>> GenericBackrefExtension which does this.  GBE is designed to work in
>> userland, not during a load but when you populate things manually.
>> GBE is handy, but is a primary place for bugs to occur, as it is an
>> event handler that issues more events.  I just fixed one the other day
>> involving some rare endless loop where the events could not decide
>> when to stop back-populating in each direction.   I was pretty amazed
>> that there were still some of those cases left after all these years.
>> 
>> So GBE as it is wouldn't work here, we'd need to write something just
>> like GBE but tailored towards loads, when there are no events firing
>> off.  It would be simpler than GBE, maybe ten lines total all
>> inlined.  But it would be a little bit redundant (now we have two back
>> populators ! one for loads, one for userland events).    If we OTOH
>> tried to make this new reverse-populator and GBE somehow work on the
>> same system, that's probably more complicated.   At the very least it
>> would add method overhead as GBE would now be delegating down a deeper
>> chain.   So this is already complexity that really would have to be
>> worth it in order to dig into.
>> 
>> We don't just now have two ways of backref population, we also have
>> two different ways that Child.parent may be populated.  It might be
>> populated by the LoaderStrategy associated with Child.parent, or it
>> might be populated by the one associated with Parent.children.   If
>> the query spans across both relationships, now both LoaderStrategy
>> objects are there, and unless we do something about it, *both* will
>> populate Child.parent - totally redundant effort, sloppy.   Do we want
>> to prevent that ?  Sure, now we need more messages and flags running
>> around trying to prevent that case - the internals become that more
>> intricate.   Why are they loading this attribute in two different ways
>> when the first way is totally fine ?   Who wrote this crap ?
>> 
>> I know you don't believe me but I've been writing this thing for five
>> years - everything gets much more hairy than you'd like.  Even the
>> immediateloader patch I made you the other day, it immediately failed
>> when i started testing it further.  I had to hack into that same
>> _instance_process() method I told you about and modify it so that it
>> could run certain attribute loaders after all the others.   I cannot
>> recall ever adding any feature, however small and innocuous, that did
>> not have some unforeseen side effects that required further testing,
>> further checks and decisions that we didn't realize would be
>> needed.
>> 
>> So with the effort of adding new code and possibly refactoring how
>> backref events work, all the new tests needed to ensure it works
>> acceptably, do we get increased performance ?  No - performance now
>> decreases for all users, unconditionally.  All collection loads now
>> get additional overhead whether the user ever needed to load the other
>> side or not.
>> 
>> Do we gain ease of use ?  For the vast majority of users who work with
>> the Session opened as is recommended, no.   They continue get
>> Child.parent for free no matter how the Parent and Child happened to
>> be loaded, if they are both in memory.   The new feature adds
>> absolutely no enhancement to usage for them.
>> 
>> Why did we do it then ?   Well, some people want to work with detached
>> objects, because they have special needs like serialization, caching,
>> or they have an issue with templates working with proxy objects, and
>> they want the ORM to support their use case without any further
>> intervention or specification.  Even though detached objects are by
>> design more tedious to work with - they are not associated with a
>> database connection, other attributes that didn't come in the load
>> aren't available, the attributes attached to their loaded child
>> objects aren't available, and in the real world, a Child object that
>> wants to go into a cache or something would likely need many of its
>> attributes pre-loaded, not just the one that happens to be associated
>> with its owning collection.   The feature then makes it one less step,
>> *if* the one attribute they care about having pre-loaded is that
>> one.   Otherwise, even their experience is not enhanced at all by the
>> new feature.
>> 
>> How could I possibly justify such a design decision ?
>> 
> 
> 
> -- 
> Jon Siddle, CoreFiling Limited
> Software Tools Developer
> http://www.corefiling.com
> Phone: +44-1865-203192
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to