On 05/01/2017 08:05 PM, Isaac Martin wrote:
I am building an api which can return children of resources if the user
requests it. For example, |user| has |messages|. I want the query to be
able to limit the number of |message| objects that are returned.
I found a useful tip about limiting the number of objects in child
collections here
<http://stackoverflow.com/questions/9148316/how-to-limit-offset-sqlalchemy-orm-relations-result?noredirect=1&lq=1>.
Basically, it indicates the following flow:
|classUser(...):# ...messages
=relationship('Messages',order_by='desc(Messages.date)',lazy='dynamic')user
=User.query.one()users.messages.limit(10)|
My use case involves returning sometimes large numbers of users.
If I were to follow the advice in that link and used |.limit()| then I
would need to iterate over the entire collection of users calling
|.limit()| on each one. This is much less efficient then, say, using
|LIMIT|in the original sql expression which created the collection.
My question is whether it is possible using declarative to
efficiently(N+0) load a large collection of objects while limiting the
number of children in their child collections using sqlalchemy?
To be clear, the below is what I am trying to /avoid/.
|users =User.query.all()messages ={}foruser
inusers:messages[user.id]=user.messages.limit(10).all()|
I want to do something more like:
|users =User.query.option(User.messages.limit(10)).all()|
so you should ignore whether or not it uses "declarative", which has
nothing to do with querying, and in fact at first ignore Query too,
because first and foremost this is a SQL problem. You want one SQL
statement that does this. What query in SQL would load lots of rows
from the primary table, joined to the first ten rows of the secondary
table for each primary?
LIMIT is tricky because it's not actually part of the usual "relational
algebra" calculation. It's outside of that because it's an artificial
limit on rows. For example, my first thought on how to do this was wrong:
select * from users left outer join (select * from messages limit 10)
as anon_1 on users.id = anon_1.user_id
This is wrong because it only gets the first ten messages in the
aggregate, disregarding user. We want to get the first ten messages for
each user, which means we need to do this "select from messages limit
10" individually for each user. That is, we need to correlate somehow.
A correlated subquery though is not usually allowed as a FROM element,
and is only allowed as a SQL expression, it can only return a single
column and a single row; we can't normally JOIN to a correlated subquery
in plain vanilla SQL. We can however, correlate inside the ON clause
of the JOIN to make this possible in vanilla SQL.
But first, if we are on a modern Postgresql version, we *can* break that
usual rule of correlation and use a keyword called LATERAL, which allows
correlation in a FROM clause. LATERAL is only supported by modern
Postgresql versions, and it makes this easy:
select * from users left outer join lateral
(select * from message where message.user_id = users.id order
by messages.date desc limit 10) as anon1 on users.id = anon_1.user_id
we support the LATERAL keyword. The query above looks like this:
subq = s.query(Messages).\
filter(Messages.user_id == User.id).\
order_by(Messages.date.desc()).limit(10).subquery().lateral()
q = s.query(User).outerjoin(subq).\
options(contains_eager(User.messages, alias=subq))
Note that above, in order to SELECT both users and messages and produce
them into the User.messages collection, the "contains_eager()" option
must be used and for that the "dynamic" has to go away. This is not the
only option, you can for example build a second relationship for
User.messages that doesn't have the "dynamic" or you can just load from
query(User, Message) separately and organize the result tuples as needed.
if you aren't using Postgresql, or a version of Postgresql that doesn't
support LATERAL, the correlation has to be worked into the ON clause of
the join instead. The SQL looks like:
select * from users left outer join messages on
users.id = messages.user_id and messages.date > (select date
from messages where messages.user_id = users.id order by date desc limit
1 offset 10)
Here, in order to jam the LIMIT in there, we are actually stepping
through the first 10 rows with OFFSET and then doing LIMIT 1 to get the
date that represents the lower bound date we want for each user. Then
we have to join while comparing on that date, which can be expensive if
this column isn't indexed and also can be inaccurate if there are
duplicate dates.
This query looks like:
subq = s.query(Messages.date).\
filter(Messages.user_id == User.id).\
order_by(Messages.date.desc()).\
limit(1).offset(10).correlate(User).as_scalar()
q = s.query(User).join(
Messages,
and_(User.id == Messages.user_id, Messages.date >= subq)
).options(contains_eager(User.messages))
These kinds of queries are the kind that I don't trust without a good
test, so POC below includes both versions including a sanity check.
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
import datetime
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
messages = relationship(
'Messages', order_by='desc(Messages.date)')
class Messages(Base):
__tablename__ = 'message'
id = Column(Integer, primary_key=True)
user_id = Column(ForeignKey('user.id'))
date = Column(Date)
e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)
s = Session(e)
s.add_all([
User(id=i, messages=[
Messages(id=(i * 20) + j, date=datetime.date(2017, 3, j))
for j in range(1, 20)
]) for i in range(1, 51)
])
s.commit()
top_ten_dates = set(datetime.date(2017, 3, j) for j in range(10, 20))
def run_test(q):
all_u = q.all()
assert len(all_u) == 50
for u in all_u:
messages = u.messages
assert len(messages) == 10
for m in messages:
assert m.user_id == u.id
received = set(m.date for m in messages)
assert received == top_ten_dates
# version 1. no LATERAL
s.close()
subq = s.query(Messages.date).\
filter(Messages.user_id == User.id).\
order_by(Messages.date.desc()).\
limit(1).offset(10).correlate(User).as_scalar()
q = s.query(User).join(
Messages,
and_(User.id == Messages.user_id, Messages.date > subq)
).options(contains_eager(User.messages))
run_test(q)
# version 2. LATERAL
s.close()
subq = s.query(Messages).\
filter(Messages.user_id == User.id).\
order_by(Messages.date.desc()).limit(10).subquery().lateral()
q = s.query(User).outerjoin(subq).\
options(contains_eager(User.messages, alias=subq))
run_test(q)
SO post here:
http://stackoverflow.com/questions/43727268/limit-child-collections-in-initial-query-sqlalchemy/43727705?noredirect=1#comment74499641_43727705
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
<mailto:sqlalchemy+unsubscr...@googlegroups.com>.
To post to this group, send email to sqlalchemy@googlegroups.com
<mailto:sqlalchemy@googlegroups.com>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.