[sqlalchemy] Re: Optimizing joined entity loads

millerdev Thu, 18 Jun 2009 11:19:50 -0700

> I dont really understand the case here.

My first example wasn't very good. In an attempt to keep it simple I
actually made it too simple. Here's another example:


Order (has items)
Item (has attributes, has tags)
Attribute
Tag

If I set both Item.attributes and Item.tags to eager-load, then my
result set size is the product of len(attributes) * len(tags), which
is where the result set becomes HUGE. This is a description of the
queries before the optimization:

select orders (1 query)
select items (1 query)
select attributes (1 query per item)
select tags (1 query per item)

I'd like to combine all attribute queries into a single query.
Likewise for tags. So instead of having 2 + len(items) * 2 queries
(assuming 10 items, that's 22 queries), I'd have exactly 4 queries.
Like this:

select orders ... where order_id = ?   (1 query)
select items ... where order_id = ?   (1 query)
select attributes ... join items where order_id = ?   (1 query)
select tags ... join items where order_id = ?   (1 query)

This would be done by the loader strategy (probably a variant of
LazyLoader), which would issue a single query. The result of that
query would be used to populate the attributes collection of each item
on the order.

> ...  So i dont  
> see how the result set is "HUGE" in one case and not the other  
> (assuming HUGE means, number of rows.  if number of columns, SQLA  
> ignores columns for entities which it already has during a load).

I think my new example above should clear up the confusion. However,
the old example (using eager loading) would return duplicate copies of
the item data for each attribute. If there are a lot of columns in the
items table, the size of the result set can get quite large using this
type of eager load, and it's pretty inefficient since it's returning a
duplicate copy of the item with each attribute. The strategy I'm
looking for eliminates all that duplicate data at the expense of a
single extra query.

In the case of having multiple relations (e.g. attributes and tags)
the eager-load result set grows exponentially, while the strategy I'm
looking for only requires a single query per relation but loads no
duplicate data. Theoretically this is the most efficient solution
possible assuming that all data must be loaded (i.e. every item,
attribute and tag).

> Normally, if you wanted the attributes to eagerload off the related  
> items, but not from the order, you would specify eagerloading on only  
> those attributes which you want eagerloaded.

Yes, I understand that. It's not what I'm asking for though.

Thanks.

~ Daniel
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Optimizing joined entity loads

Reply via email to