Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.

Jon Siddle Tue, 14 Sep 2010 09:22:50 -0700

 Thank you for such a full elaboration. I still think the end result is
something a little unintuitive (albeit only for those using detached
objects, who will come across it); but I can't argue against your
decision here. Based on the information you've given, keeping
things the way they are is undoubtedly the decision that will benefit
most users. I've worked with ORM internals before, and I know only
too well that architectural decisions made for sound reasons can
nevertheless make some things difficult, which feel like they should
be simple. As always, the best you can do is maximise the value to
the most people.


Perhaps just a note in the eagerload docs to say "...will not load
backreferences eagerly. If you are using detached objects, try the
join...contains_eager pattern." or words to that effect?

Thanks

Jon



On 14/09/10 16:27, Michael Bayer wrote:


On Sep 14, 6:03 am, Jon Siddle<j...@corefiling.co.uk>  wrote:

I would agree with all of this if I understood why a) it takes an
appreciable number of function calls or b) "automatic decisionmaking" is
necessary. I don't think there's any ambiguity here, but again; perhaps
I'm missing something fundamental.

Theres a "backref" with an attribute handler that receives "append"
events in userland, and emits a corresponding "set" event on the other
side.     The Collection attribute implementation could receive extra
code that would cause it to do something similar to the backref
handler when the collection is populated.    This is what you are
looking for here.

The append of the collection ultimately occurs in mapper.py line 2298
in the current tip.   The function calling it, a closure called
_instance() that is generated inside of _instance_process(), is always
the top hit on any profiling run against a load of objects.
Removing just three function calls from this method is a win.   Months
of work have gone into the workings of this method so that our load
speeds remain competitive with, and sometimes even faster than, other
ORMs that do not use identity maps or units of work, ORMs that are now
written almost entirely in C (I leave the identification of these
products as an exercise for the reader).

So the implementation here would probably replace the list being
appended onto with one that is additionally instrumented to catch
these events and populate in the other direction.  We already have the
GenericBackrefExtension which does this.  GBE is designed to work in
userland, not during a load but when you populate things manually.
GBE is handy, but is a primary place for bugs to occur, as it is an
event handler that issues more events.  I just fixed one the other day
involving some rare endless loop where the events could not decide
when to stop back-populating in each direction.   I was pretty amazed
that there were still some of those cases left after all these years.

So GBE as it is wouldn't work here, we'd need to write something just
like GBE but tailored towards loads, when there are no events firing
off.  It would be simpler than GBE, maybe ten lines total all
inlined.  But it would be a little bit redundant (now we have two back
populators ! one for loads, one for userland events).    If we OTOH
tried to make this new reverse-populator and GBE somehow work on the
same system, that's probably more complicated.   At the very least it
would add method overhead as GBE would now be delegating down a deeper
chain.   So this is already complexity that really would have to be
worth it in order to dig into.

We don't just now have two ways of backref population, we also have
two different ways that Child.parent may be populated.  It might be
populated by the LoaderStrategy associated with Child.parent, or it
might be populated by the one associated with Parent.children.   If
the query spans across both relationships, now both LoaderStrategy
objects are there, and unless we do something about it, *both* will
populate Child.parent - totally redundant effort, sloppy.   Do we want
to prevent that ?  Sure, now we need more messages and flags running
around trying to prevent that case - the internals become that more
intricate.   Why are they loading this attribute in two different ways
when the first way is totally fine ?   Who wrote this crap ?

I know you don't believe me but I've been writing this thing for five
years - everything gets much more hairy than you'd like.  Even the
immediateloader patch I made you the other day, it immediately failed
when i started testing it further.  I had to hack into that same
_instance_process() method I told you about and modify it so that it
could run certain attribute loaders after all the others.   I cannot
recall ever adding any feature, however small and innocuous, that did
not have some unforeseen side effects that required further testing,
further checks and decisions that we didn't realize would be
needed.

So with the effort of adding new code and possibly refactoring how
backref events work, all the new tests needed to ensure it works
acceptably, do we get increased performance ?  No - performance now
decreases for all users, unconditionally.  All collection loads now
get additional overhead whether the user ever needed to load the other
side or not.

Do we gain ease of use ?  For the vast majority of users who work with
the Session opened as is recommended, no.   They continue get
Child.parent for free no matter how the Parent and Child happened to
be loaded, if they are both in memory.   The new feature adds
absolutely no enhancement to usage for them.

Why did we do it then ?   Well, some people want to work with detached
objects, because they have special needs like serialization, caching,
or they have an issue with templates working with proxy objects, and
they want the ORM to support their use case without any further
intervention or specification.  Even though detached objects are by
design more tedious to work with - they are not associated with a
database connection, other attributes that didn't come in the load
aren't available, the attributes attached to their loaded child
objects aren't available, and in the real world, a Child object that
wants to go into a cache or something would likely need many of its
attributes pre-loaded, not just the one that happens to be associated
with its owning collection.   The feature then makes it one less step,
*if* the one attribute they care about having pre-loaded is that
one.   Otherwise, even their experience is not enhanced at all by the
new feature.

How could I possibly justify such a design decision ?



--
Jon Siddle, CoreFiling Limited
Software Tools Developer
http://www.corefiling.com
Phone: +44-1865-203192

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: (Hopefully) simple problem with backrefs not being loaded when eagerloading.

Reply via email to