Re: [sqlalchemy] subqueryload for a ColumnProperty?

Sumeet Agarwal Sun, 11 Sep 2011 04:06:02 -0700

Mapped selects, nice! I wish I'd seen that part in the documentation
earlier. Do you think it might be worth a reference near Alternatives to
column_property()<http://www.sqlalchemy.org/docs/orm/mapper_config.html#alternatives-to-column-property>where
you also mention object_session()?


So there's one more issue. Once there's a lot of customers and orders, this
query performs pretty poorly in MySQL (tried 5.5.14 Source and 5.0.92-50-log
Percona). To simplify, I'll use joinedload() in my example instead of
subqueryload().

>>> f = 
>>> s.query(Customer).filter_by(id=46).options(joinedload("_num_orders")).all()2011-09-11
>>>  03:04:33,598 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)2011-09-11 
>>> 03:04:33,599 INFO sqlalchemy.engine.base.Engine SELECT customer.id AS 
>>> customer_id, anon_1.customer_id AS anon_1_customer_id, anon_1.count AS 
>>> anon_1_count
FROM customer LEFT OUTER JOIN (SELECT `order`.customer_id AS
customer_id, count(`order`.id) AS count
FROM `order` GROUP BY `order`.customer_id) AS anon_1 ON customer.id =
anon_1.customer_id
WHERE customer.id = %(id_1)s2011-09-11 03:04:33,599 INFO
sqlalchemy.engine.base.Engine {'id_1': 46}


The EXPLAIN plan:

mysql> explain SELECT customer.id AS customer_id, anon_1.customer_id
AS anon_1_customer_id, anon_1.count AS anon_1_count  FROM customer
LEFT OUTER JOIN (SELECT `order`.customer_id AS customer_id,
count(`order`.id) AS count  FROM `order` GROUP BY `order`.customer_id)
AS anon_1 ON customer.id = anon_1.customer_id  WHERE customer.id = 46
\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: customer
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 11200
        Extra:
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: order
         type: index
possible_keys: NULL
          key: customer_id
      key_len: 5
          ref: NULL
         rows: 28725
        Extra: Using index3 rows in set (0.04 sec)


It results in a huge derived table and a table scan over the order table.
Out of curiosity, I ran the same statement in PostgresSQL (9.0.4) and
noticed it works much better:

sumeet=# explain analyze SELECT customer.id AS customer_id,
anon_1.customer_id AS anon_1_customer_id, anon_1.count AS anon_1_count
FROM customer LEFT OUTER JOIN (SELECT "order".customer_id AS
customer_id, count("order".id) AS count FROM "order" GROUP BY
"order".customer_id) AS anon_1 ON customer.id = anon_1.customer_id
WHERE customer.id = 46;
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..508.06 rows=1 width=16) (actual
time=4.069..4.070 rows=1 loops=1)
   Join Filter: (customer.id = "order".customer_id)
   ->  Index Scan using customer_pkey on customer  (cost=0.00..8.27
rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
         Index Cond: (id = 46)
   ->  GroupAggregate  (cost=0.00..499.77 rows=1 width=8) (actual
time=4.046..4.046 rows=1 loops=1)
         ->  Seq Scan on "order"  (cost=0.00..499.75 rows=2 width=8)
(actual time=0.021..4.042 rows=3 loops=1)
               Filter: (customer_id = 46)
 Total runtime: 4.127 ms
(8 rows)


Not that it helps me any :) I'm on MySQL. I can dramatically improve the
query by adding a WHERE clause inside the subquery. I bolded where the
change is:

mysql> explain SELECT customer.id AS customer_id, anon_1.customer_id
AS anon_1_customer_id, anon_1.count AS anon_1_count  FROM customer
LEFT OUTER JOIN (SELECT `order`.customer_id AS customer_id,
count(`order`.id) AS count  FROM `order` *WHERE customer_id = 46*
GROUP BY `order`.customer_id) AS anon_1 ON customer.id =
anon_1.customer_id  WHERE customer.id = 46 \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: customer
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: system
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: order
         type: ref
possible_keys: customer_id
          key: customer_id
      key_len: 5
          ref:
         rows: 3
        Extra: Using where; Using index3 rows in set (0.00 sec)


I tried adding `anon_1.customer_id = 46` to the outer query's WHERE clause
since I'm pretty sure that's easily expressed with SQLAlchemy, but it didn't
help at all. Would it be easy to keep OrderCount defined as a relationship,
but somehow add a filter to the mapped select at runtime? Seems like then,
I'd be all set.

Thank you for the help.

On Wed, Sep 7, 2011 at 2:39 PM, Michael Bayer <mike...@zzzcomputing.com>wrote:

>
> On Sep 7, 2011, at 5:09 PM, Sumeet Agarwal wrote:
>
> Still getting the hang of SQLAlchemy terminology.
>
> I didn't mean a simple deferred() column, but a deferred column_property()
> which defines a scalar select(). Like in the example here
> http://www.sqlalchemy.org/docs/orm/mapper_config.html#sql-expressions-as-mapped-attributes
> .
>
> My goal is to, instead of including a correlated subquery in the SELECT.
> The column_property looks like
>
>
> "My goal is to...", missing a verb there :).     If it were me, I'd think
> you're asking to unwrap the correlated subquery into a plain JOIN.   Since
> that works more efficiently.
>
>
> I'd rather do a separate query to load num_orders, rather than getting it
> lazily or by using undefer(). It almost feels like I might want to define
> num_orders as a relationship somehow? I dunno if what I would like to do
> maps cleanly to a SQLAlchemy pattern.
>
>
> Using relationship() is kind of the old school approach, but yup I think
> that actually allows you to get exactly subqueryload() for an attribute, so
> good idea !  a rare moment that SQLAlchemy amazes even me.   here's that
> since its faster for me to just illustrate
>
> from sqlalchemy import *
> from sqlalchemy.orm import *
> from sqlalchemy.ext.declarative import declarative_base
>
> Base= declarative_base()
>
> class Customer(Base):
>     __tablename__ = 'customer'
>     id = Column(Integer, primary_key=True)
>     orders = relationship("Order")
>
>     @property
>     def num_orders(self):
>         return self._num_orders.count if self._num_orders else 0
>
> class Order(Base):
>     __tablename__ = 'order'
>     id = Column(Integer, primary_key=True)
>     customer_id = Column(Integer, ForeignKey('customer.id'))
>
> class OrderCount(object):
>     pass
> order_counts = select([Order.customer_id,
> func.count(Order.id).label('count')]).\
>         group_by(Order.customer_id).alias()
>
> mapper(
>     OrderCount,
>     order_counts,
>     primary_key=order_counts.c.customer_id
> )
>
> Customer._num_orders = relationship(OrderCount, uselist=False)
>
> e = create_engine('sqlite://', echo=True)
>
> Base.metadata.create_all(e)
>
> s = Session(e)
> s.add_all([
>     Customer(orders=[Order(), Order(), Order()]),
>     Customer(orders=[Order(), Order()]),
>     Customer(orders=[]),
> ])
> s.commit()
>
> for c in s.query(Customer).options(subqueryload("_num_orders")):
>     print c.id, c.num_orders
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Sep 7, 2011 at 7:51 AM, Michael Bayer <mike...@zzzcomputing.com>wrote:
>
>> For a deferred() itself, we don't have an option that does this.    Though
>> this is an unusual request.   If you definitely want the deferreds to load,
>> what's the issue having them render inline into the original query ?    The
>> advantage to "subqueryload" is primarily in that it loads multiple rows per
>> parent object efficiently, without re-fetching the full parent row many
>> times,  or needing to use a less efficient OUTER JOIN.   A deferred is
>> always one row per parent - and no join is needed anyway.
>>
>>
>>
>>
>> On Sep 6, 2011, at 6:49 PM, Sumeet Agarwal wrote:
>>
>> > I have a collection of deferred `ColumnProperty`s that I'd like to
>> > start loading in subsequent queries. I know you can get this to happen
>> > on the first access of a `ColumnProperty` that is deferred, but I
>> > wonder if there's any way to specify this in the query.
>> >
>> > For relationships, `joinedload[_all()]` has a counterpart
>> > `subqueryload[_all()]`. Is there nothing similar we can do for
>> > `ColumnProperty`?
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "sqlalchemy" group.
>> > To post to this group, send email to sqlalchemy@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> sqlalchemy+unsubscr...@googlegroups.com.
>> > For more options, visit this group at
>> http://groups.google.com/group/sqlalchemy?hl=en.
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To post to this group, send email to sqlalchemy@googlegroups.com.
>> To unsubscribe from this group, send email to
>> sqlalchemy+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/sqlalchemy?hl=en.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] subqueryload for a ColumnProperty?

Reply via email to