On 05/01/2017 08:05 PM, Isaac Martin wrote:

I am building an api which can return children of resources if the user requests it. For example, |user| has |messages|. I want the query to be able to limit the number of |message| objects that are returned.

I found a useful tip about limiting the number of objects in child collections here <http://stackoverflow.com/questions/9148316/how-to-limit-offset-sqlalchemy-orm-relations-result?noredirect=1&lq=1>. Basically, it indicates the following flow:

|classUser(...):# ...messages =relationship('Messages',order_by='desc(Messages.date)',lazy='dynamic')user =User.query.one()users.messages.limit(10)|

My use case involves returning sometimes large numbers of users.

If I were to follow the advice in that link and used |.limit()| then I would need to iterate over the entire collection of users calling |.limit()| on each one. This is much less efficient then, say, using |LIMIT|in the original sql expression which created the collection.

My question is whether it is possible using declarative to efficiently(N+0) load a large collection of objects while limiting the number of children in their child collections using sqlalchemy?


To be clear, the below is what I am trying to /avoid/.

|users =User.query.all()messages ={}foruser inusers:messages[user.id]=user.messages.limit(10).all()|

I want to do something more like:

|users =User.query.option(User.messages.limit(10)).all()|

so you should ignore whether or not it uses "declarative", which has nothing to do with querying, and in fact at first ignore Query too, because first and foremost this is a SQL problem. You want one SQL statement that does this. What query in SQL would load lots of rows from the primary table, joined to the first ten rows of the secondary table for each primary?

LIMIT is tricky because it's not actually part of the usual "relational algebra" calculation. It's outside of that because it's an artificial limit on rows. For example, my first thought on how to do this was wrong:

select * from users left outer join (select * from messages limit 10) as anon_1 on users.id = anon_1.user_id

This is wrong because it only gets the first ten messages in the aggregate, disregarding user. We want to get the first ten messages for each user, which means we need to do this "select from messages limit 10" individually for each user. That is, we need to correlate somehow. A correlated subquery though is not usually allowed as a FROM element, and is only allowed as a SQL expression, it can only return a single column and a single row; we can't normally JOIN to a correlated subquery in plain vanilla SQL. We can however, correlate inside the ON clause of the JOIN to make this possible in vanilla SQL.

But first, if we are on a modern Postgresql version, we *can* break that usual rule of correlation and use a keyword called LATERAL, which allows correlation in a FROM clause. LATERAL is only supported by modern Postgresql versions, and it makes this easy:

        select * from users left outer join lateral
(select * from message where message.user_id = users.id order by messages.date desc limit 10) as anon1 on users.id = anon_1.user_id

we support the LATERAL keyword.   The query above looks like this:


    subq = s.query(Messages).\
        filter(Messages.user_id == User.id).\
        order_by(Messages.date.desc()).limit(10).subquery().lateral()

    q = s.query(User).outerjoin(subq).\
         options(contains_eager(User.messages, alias=subq))

Note that above, in order to SELECT both users and messages and produce them into the User.messages collection, the "contains_eager()" option must be used and for that the "dynamic" has to go away. This is not the only option, you can for example build a second relationship for User.messages that doesn't have the "dynamic" or you can just load from query(User, Message) separately and organize the result tuples as needed.

if you aren't using Postgresql, or a version of Postgresql that doesn't support LATERAL, the correlation has to be worked into the ON clause of the join instead. The SQL looks like:

        select * from users left outer join messages on
users.id = messages.user_id and messages.date > (select date from messages where messages.user_id = users.id order by date desc limit 1 offset 10)

Here, in order to jam the LIMIT in there, we are actually stepping through the first 10 rows with OFFSET and then doing LIMIT 1 to get the date that represents the lower bound date we want for each user. Then we have to join while comparing on that date, which can be expensive if this column isn't indexed and also can be inaccurate if there are duplicate dates.

This query looks like:

subq = s.query(Messages.date).\
    filter(Messages.user_id == User.id).\
    order_by(Messages.date.desc()).\
    limit(1).offset(10).correlate(User).as_scalar()

q = s.query(User).join(
    Messages,
    and_(User.id == Messages.user_id, Messages.date >= subq)
).options(contains_eager(User.messages))


These kinds of queries are the kind that I don't trust without a good test, so POC below includes both versions including a sanity check.


from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
import datetime

Base = declarative_base()


class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)
    messages = relationship(
        'Messages', order_by='desc(Messages.date)')

class Messages(Base):
    __tablename__ = 'message'
    id = Column(Integer, primary_key=True)
    user_id = Column(ForeignKey('user.id'))
    date = Column(Date)

e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

s = Session(e)

s.add_all([
    User(id=i, messages=[
        Messages(id=(i * 20) + j, date=datetime.date(2017, 3, j))
        for j in range(1, 20)
    ]) for i in range(1, 51)
])

s.commit()

top_ten_dates = set(datetime.date(2017, 3, j) for j in range(10, 20))


def run_test(q):
    all_u = q.all()
    assert len(all_u) == 50
    for u in all_u:

        messages = u.messages
        assert len(messages) == 10

        for m in messages:
            assert m.user_id == u.id

        received = set(m.date for m in messages)

        assert received == top_ten_dates

# version 1.   no LATERAL

s.close()

subq = s.query(Messages.date).\
    filter(Messages.user_id == User.id).\
    order_by(Messages.date.desc()).\
    limit(1).offset(10).correlate(User).as_scalar()

q = s.query(User).join(
    Messages,
    and_(User.id == Messages.user_id, Messages.date > subq)
).options(contains_eager(User.messages))

run_test(q)

# version 2.  LATERAL

s.close()

subq = s.query(Messages).\
    filter(Messages.user_id == User.id).\
    order_by(Messages.date.desc()).limit(10).subquery().lateral()

q = s.query(User).outerjoin(subq).\
    options(contains_eager(User.messages, alias=subq))

run_test(q)






SO post here: http://stackoverflow.com/questions/43727268/limit-child-collections-in-initial-query-sqlalchemy/43727705?noredirect=1#comment74499641_43727705

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com <mailto:sqlalchemy+unsubscr...@googlegroups.com>. To post to this group, send email to sqlalchemy@googlegroups.com <mailto:sqlalchemy@googlegroups.com>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to