Re: [sqlalchemy] Eager loading hates single table inheritance

Michael Bayer Mon, 09 Jan 2012 08:19:18 -0800

On Jan 9, 2012, at 7:34 AM, Yuen Ho Wong wrote:

> Here's a list of pathological test cases that confuses the hell out of SA 
> while trying to eager load more than 1 collections which are mapped to the 
> same table using single table inheritance. In short, only joinedload*() 
> appears to work out of all the eager loading methods. This pretty much means 
> that supplying eager load options to a Query object doesn't mean you will 
> always get back the same result.


A common theme in SQLAlchemy is that, despite all the flak we get for being 
"complicated", SQLAlchemy is actually very simple.   It has a handful of 
constructs which seek to do exactly the same thing in exactly the same way, as 
consistently as possible.   So we sometimes get requests for "SQLAlchemy should 
figure out XYZ" and it's like, well not really, that would be really 
complicated.

Here we have a mixture of that, as well as some missing features that are 
planned, as well as a few API mis-usages, as well as I wouldn't really do 
things the way you're doing them at least for now.

The first thing I note here is, if I were doing a model like this, I'd either 
use two different association tables between Product->Origin and Product->Food, 
or I'd make one relationship(), and handle the filtering in Python (which is 
essentially what you're wishing SQLAlchemy did here).   The ORM wants to know 
about your table relationships which here is just A->assoc->B.    All of the 
issues here, some of which I consider to be SQLA bugs anyway, would go away if 
you did it in that manner, subqueryloading would be efficient, etc.

Another nitpick, joinedload_all(), subqueryload_all() is meant to load along 
chains: A->B->C->D, not siblings, A->B, A->C, i.e.:

    p = session.query(Product)\
            .options(joinedload_all(Product.origin, Product.foods))\  ---> 
incorrect
            .filter(Product.id == 2).one()

you'd use two joinedload() twice for that.     So all of the examples where 
you're distinguishing xyz_all() from xyz(), all the same thing.   

It's silently ignored now so http://www.sqlalchemy.org/trac/ticket/2370 is 
added.  If I can get to it in 0.7 it will emit a warning, will raise an error 
in 0.8.

In the subqueryload_all()/subqueryload() example, you'll note the second one 
with two separate subqueryload() calls does in fact correctly do the 
"subqueryload" twice.   There's no way SQLA would ever "figure out" 
automatically that they're against the same table and then join the two queries 
together, decisionmaking like would be enormously complicated as well as the 
mechanics of how to get a single loader to coordinate among two relationships.  
 If you use just one relationship() to SearchOption then provide filtered 
accessors, then you get exactly the optimization you're looking for.

In the joinedload() example, that's sort of a bug or sort of a missing feature. 
   I can't fix that immediately, because we still have a policy whereby 
eagerloading doesn't parenthesize the joins - it always flattens them out.   A 
discussion of this issue is at http://www.sqlalchemy.org/trac/ticket/2120 and 
at this point it's basically SQLite preventing us from doing it, as it chokes 
on a statement like : select * from a left outer join (b join c on b.id=c.bid) 
on a.id=b.aid; .    The example is added in 
http://www.sqlalchemy.org/trac/ticket/2369 .

Ticket 2120 calls for at least an option "nested_joins=True", specifying that 
the more correct/efficient system of nesting joins should be used.   This is 
all 0.8 stuff, as the joined eager loading code would be destabilized by this - 
if it turns out to be an isolated option it could move to 0.7.   Your set of 
tests here makes me more interested in the issue though so perhaps we'll see if 
I have time to try some things out.

Finally the contains_eager version.   SQLAlchemy again expects two distinct 
sets of columns for each relationship, so you must join to the table twice:

    sa1 = aliased(OriginOption)
    sa2 = aliased(FoodOption)

    p = session.query(Product)\
            .join(sa1, Product.origin)\
            .join(sa2, Product.foods)\
            .options(contains_eager(Product.origin, alias=sa1),
                     contains_eager(Product.foods, alias=sa2))\
            .filter(Product.id == 2).one()


Same theme here, you're hoping SQLAlchemy can "figure out" something in Python, 
i.e. that only "certain" rows being routed to Product.origin/Product.foods 
should be used for each, but it's not that complicated.     It relies upon SQL 
to present it with the correct data geometry and assumes it is correct.  
Second-guessing what it gets back from SQL to check, "oh is this some special 
1% edge case where I might have to filter extra types that I'm not supposed to 
receive here?" wouldn't be efficient or consistent with how everything else 
works.

So overall, a few bugs we'll fix at some point, but if you want filtering in 
Python, build that onto your Product object.   I would note how even though 
SQLAlchemy is not doing what you hoped for here, it is doing what it does very 
consistently.




-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Eager loading hates single table inheritance

Reply via email to