Re: [sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

Jon Siddle Tue, 14 Sep 2010 03:03:41 -0700

 On 13/09/10 18:21, Michael Bayer wrote:

On Sep 13, 2010, at 12:26 PM, Jon Siddle wrote:

This relationship is satisfied as you request it, and it works by looking in the current 
Session's identity map for the primary key stored by the many-to-one.   The operation 
falls under the realm of "lazyloading" even though no SQL is emitted.   If you 
consider that Child may have many related many-to-ones, all of which may already be in 
the Session, it would be quite wasteful for the ORM to assume that you're going to be 
working with the object in a detached state and that you need all of them.

I'm not sure I see what you're saying here. I've explicitly asked for "all children 
relating to parent" and these are correctly queried and loaded. While they are being 
added to the parent.children list, why not also set each child.parent since this is known?

because you didn't specify it, and it takes a palpable amount of additional 
overhead to do so

I don't see why it's more overhead than an assignment child.parent = ...at the same time as the list append parent.children.append(...). There'sobviously something more complex going on behind the scenes.

  as well as a palpable amount of complexity to decide if it should do so based 
on the logic you'd apply here, when in 99% of the cases it is not needed.

I just don't see the complexity of the logic here. I've specified I wantto join parent to each child, and it's already doing so in onedirection. I realise this is only a problem

for detached objects, but it leads to quite confusing behaviour, I think.

I don't see how this is wasteful, but I may be missing something.

Child may have parent, foo, bar, bat attached to it, all many-to-ones.   Which 
ones should it assume the user wants to load ?

parent. Because I have explicitly asked it to using "joinedload" or"eagerload".

   If you are loading 10000 rows, and each Child object has three many-to-ones 
on it, and suppose it takes 120 function calls to look at a relationship, 
determine the values to send to query._get(), look in the identity map, etc., 
that is 3 x 10000 x 120 = 3.6 million function calls

But you don't have to look in the identity map at all, since you've justset the parent-child association in the other direction and thus haveboth entities to hand, right?

, by default, almost never needed since normally they are all just there in the 
current session, without the user asking to do so.    There is nothing special 
about Child.parent just because Parent.children is present up the chain.    
While Hibernate may have decided that the complexity and overhead of adding 
this decisionmaking was worth it, they have many millions more function calls 
to burn with the Java VM in any case than we do in Python, and they also have a 
much higher bar to implement lazyloading since their native class 
instrumentation is itself a huge beast.   In our case it is simply unnecessary. 
 Any such automatic decisionmaking you can be sure quickly leads to many 
uncomfortable edge cases and thorny situations, causing needless surprise 
issues for users who don't really need such a behavior in the first place.

I would agree with all of this if I understood why a) it takes anappreciable number of function calls or b) "automatic decisionmaking" isnecessary. I don't think there's any ambiguity here, but again; perhapsI'm missing something fundamental.

As I've mentioned, you will have an option to tell it which many-to-ones you'd like it to 
spend time pre-populating using the upcoming "immedateload" option.

I still think this can be done with negligible overhead if it's done atthe same time as the other side of the relation (parent->child). PerhapsI'll have to dig around in the code to see why this is such a problem.

The Session's default assumption is that you're going to be leaving it around while you 
work with the objects contained, and in that way you interact with the database for as 
long as you deal with its objects, which represent "proxies" to the underlying 
transaction.   When objects are detached, for reasons like caching and serialization to 
other places, normally you'd merge() them back when you want to use them again.   So if 
it were me I'd normally be looking to not be closing the session.

I'm closing the session before I forward the objects to the view template in a 
web application. The template has no business doing database operations,

I disagree with this interpretation of "abstraction".   That's like saying that 
pushing the button in an elevator means you're now in charge of starting up the elevator 
motors and instructing them how many feet to travel.

Huh? I didn't use the word "abstraction".

The template is not "doing" database operations, it is working with high level 
objects that you've sent it, and knows nothing of a database.   That these objects may be 
doing database calls behind the scenes to lazily fetch additional data is known as the 
proxy pattern.  It is one of the most fundamental patterns in object oriented software 
design.     Separation of concerns is about what kinds of source code and awareness of 
systems live in various places - it has nothing to do operational timing or initiation.

These operations may be "magic", but they're not transparent. They canfail and/or have a potentially huge overhead. This is often intollerableinside a view template. It's so much nicer to have a dumb and fast webtemplate with no nasty surprises like n+1 selects. This is orthogonal toSoC.

The "pre-load" scenario is certainly valid if you're trying to render from an object 
graph that loads from a cache and doesn't want to do any additional database calls.  But this is 
strictly an issue of optimization, not "correct" software design.

I'm not arguing that it's the only "correct" way, but I do think it hasbenefits.

If you can point out what I'm missing that makes this so difficult, I'dbe interested; or I may get a chance to look through the code at somepoint. Regardless, you've provided a practical solution to my problem(contains_eager) that seems to be working well.


Thanks

Jon

--
Jon Siddle, CoreFiling Limited
Software Tools Developer
http://www.corefiling.com
Phone: +44-1865-203192

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] (Hopefully) simple problem with backrefs not being loaded when eagerloading.

Reply via email to