Well that was a very silly mistake. End of the day oversight is what happened here. I didn't notice that "offset" was set to 10 and my unit test only had 2 values. When I set offset to 0 it worked great.
Sincerest thanks for this! I hadn't seen as_scalar used before. I'm looking forward to trying subqueries like this myself in the future. On Mon, May 1, 2017 at 9:38 PM, Isaac Martin <melchoi...@gmail.com> wrote: > Thank you very much for your response. This solution isn't working for me, > and I'm 99% sure I'm not translating what you've written into something > that works for my use case. My situation is somewhat more complicated than > I initially wrote. My first question was in the vain hope that there was > some easy way using declarative to tell anything loaded through a > relationship() to load with a LIMIT. Obviously that isn't the case. > > In my situation, I have a relationship which I have defined by hand. There > are not foreign keys between the messages and the objects being loaded > (Connections). When I try to use your lateral solution I get an error > thrown that there are no foreign key relationships, which is true. > > Here is my model edited down to what I believe are the minimal relevant > fields. I'm trying to return Connections with a limited number of objects > in the Connection.messages field: > > *class Connection(Base):* > * __tablename__ = 'Connection'* > > * user_1_id = Column(Integer, ForeignKey('User.uid'))* > * user_2_id = Column(Integer, ForeignKey('User.uid'))* > * messages = relationship('Message',* > * secondary="join(MessageRecipient, Message, > MessageRecipient.message_id == Message.uid)",* > * primaryjoin= 'or_(Connection.user_1_id == > MessageRecipient.recipient_id,'* > * 'Connection.user_2_id == > MessageRecipient.recipient_id)',* > * order_by="Message.created.desc()")* > > *class MessageRecipient(Base):* > * __tablename__ = 'MessageRecipient'* > > * recipient_id = Column(Integer, ForeignKey('User.uid'))* > > *class Message(Base):* > * __tablename__ = 'Message'* > > * created = Column(DateTime, default=func.current_timestamp())* > > * body = Column(String(2000))* > > > Now, I'm using Postgres, so I did attempt to translate your lateral > example: > > * subq = Message.query.\* > * filter(and_(Message.uid == MessageRecipient.message_id, > or_(MessageRecipient.uid == Connection.user_1_id,* > * MessageRecipient.uid == > Connection.user_2_id))).\* > * > order_by(Message.created.desc()).limit(1).subquery().lateral()* > > * q = Connection.query.outerjoin(subq). \* > * options(contains_eager(Connection.messages, alias=subq))* > > The above throws an error: > Can't find any foreign key relationships between 'Connection' and > '%(140122530861688 anon)s'. > > I also tried your non-lateral example > * subq = self.db.session.query(Message.created). \* > * filter(and_(Message.uid == MessageRecipient.message_id, > or_(MessageRecipient.recipient_id == Connection.user_1_id,* > * MessageRecipient.recipient_id == > Connection.user_2_id))).\* > * order_by(Message.created.desc()). \* > * limit(1).offset(10).correlate(Connection).as_scalar()* > > * q = self.db.session.query(Connection).join(* > * MessageRecipient,* > * or_(MessageRecipient.recipient_id == Connection.user_1_id,* > * MessageRecipient.recipient_id == Connection.user_2_id)).\* > * join( Message,* > * and_(MessageRecipient.message_id == Message.uid, > Message.created > subq)* > * ).options(contains_eager(Connection.messages)).all()* > > The above does not throw an error, but it also does not return any > entities. I'm going to keep working on the non-lateral example because I > don't understand your example well enough to confidently say I'm being > faithful to it. If you have other advice based on this new information then > I'd really appreciate it. If I figure out what I was doing wrong I'll post > here for posterity. > > On Mon, May 1, 2017 at 7:34 PM, mike bayer <mike...@zzzcomputing.com> > wrote: > >> >> >> On 05/01/2017 08:05 PM, Isaac Martin wrote: >> >>> >>> I am building an api which can return children of resources if the user >>> requests it. For example, |user| has |messages|. I want the query to be >>> able to limit the number of |message| objects that are returned. >>> >>> I found a useful tip about limiting the number of objects in child >>> collections here <http://stackoverflow.com/ques >>> tions/9148316/how-to-limit-offset-sqlalchemy-orm-relations- >>> result?noredirect=1&lq=1>. Basically, it indicates the following flow: >>> >>> |classUser(...):# ...messages =relationship('Messages',order >>> _by='desc(Messages.date)',lazy='dynamic')user >>> =User.query.one()users.messages.limit(10)| >>> >>> My use case involves returning sometimes large numbers of users. >>> >>> If I were to follow the advice in that link and used |.limit()| then I >>> would need to iterate over the entire collection of users calling >>> |.limit()| on each one. This is much less efficient then, say, using >>> |LIMIT|in the original sql expression which created the collection. >>> >>> My question is whether it is possible using declarative to >>> efficiently(N+0) load a large collection of objects while limiting the >>> number of children in their child collections using sqlalchemy? >>> >>> >>> To be clear, the below is what I am trying to /avoid/. >>> >>> |users =User.query.all()messages ={}foruser inusers:messages[user.id >>> ]=user.messages.limit(10).all()| >>> >>> I want to do something more like: >>> >>> |users =User.query.option(User.messages.limit(10)).all()| >>> >> >> so you should ignore whether or not it uses "declarative", which has >> nothing to do with querying, and in fact at first ignore Query too, because >> first and foremost this is a SQL problem. You want one SQL statement that >> does this. What query in SQL would load lots of rows from the primary >> table, joined to the first ten rows of the secondary table for each primary? >> >> LIMIT is tricky because it's not actually part of the usual "relational >> algebra" calculation. It's outside of that because it's an artificial >> limit on rows. For example, my first thought on how to do this was wrong: >> >> select * from users left outer join (select * from messages limit >> 10) as anon_1 on users.id = anon_1.user_id >> >> This is wrong because it only gets the first ten messages in the >> aggregate, disregarding user. We want to get the first ten messages for >> each user, which means we need to do this "select from messages limit 10" >> individually for each user. That is, we need to correlate somehow. A >> correlated subquery though is not usually allowed as a FROM element, and is >> only allowed as a SQL expression, it can only return a single column and a >> single row; we can't normally JOIN to a correlated subquery in plain >> vanilla SQL. We can however, correlate inside the ON clause of the JOIN >> to make this possible in vanilla SQL. >> >> But first, if we are on a modern Postgresql version, we *can* break that >> usual rule of correlation and use a keyword called LATERAL, which allows >> correlation in a FROM clause. LATERAL is only supported by modern >> Postgresql versions, and it makes this easy: >> >> select * from users left outer join lateral >> (select * from message where message.user_id = users.id order by >> messages.date desc limit 10) as anon1 on users.id = anon_1.user_id >> >> we support the LATERAL keyword. The query above looks like this: >> >> >> subq = s.query(Messages).\ >> filter(Messages.user_id == User.id).\ >> order_by(Messages.date.desc()).limit(10).subquery().lateral() >> >> q = s.query(User).outerjoin(subq).\ >> options(contains_eager(User.messages, alias=subq)) >> >> Note that above, in order to SELECT both users and messages and produce >> them into the User.messages collection, the "contains_eager()" option must >> be used and for that the "dynamic" has to go away. This is not the only >> option, you can for example build a second relationship for User.messages >> that doesn't have the "dynamic" or you can just load from query(User, >> Message) separately and organize the result tuples as needed. >> >> if you aren't using Postgresql, or a version of Postgresql that doesn't >> support LATERAL, the correlation has to be worked into the ON clause of the >> join instead. The SQL looks like: >> >> select * from users left outer join messages on >> users.id = messages.user_id and messages.date > (select date >> from messages where messages.user_id = users.id order by date desc limit >> 1 offset 10) >> >> Here, in order to jam the LIMIT in there, we are actually stepping >> through the first 10 rows with OFFSET and then doing LIMIT 1 to get the >> date that represents the lower bound date we want for each user. Then we >> have to join while comparing on that date, which can be expensive if this >> column isn't indexed and also can be inaccurate if there are duplicate >> dates. >> >> This query looks like: >> >> subq = s.query(Messages.date).\ >> filter(Messages.user_id == User.id).\ >> order_by(Messages.date.desc()).\ >> limit(1).offset(10).correlate(User).as_scalar() >> >> q = s.query(User).join( >> Messages, >> and_(User.id == Messages.user_id, Messages.date >= subq) >> ).options(contains_eager(User.messages)) >> >> >> These kinds of queries are the kind that I don't trust without a good >> test, so POC below includes both versions including a sanity check. >> >> >> from sqlalchemy import * >> from sqlalchemy.orm import * >> from sqlalchemy.ext.declarative import declarative_base >> import datetime >> >> Base = declarative_base() >> >> >> class User(Base): >> __tablename__ = 'user' >> id = Column(Integer, primary_key=True) >> messages = relationship( >> 'Messages', order_by='desc(Messages.date)') >> >> class Messages(Base): >> __tablename__ = 'message' >> id = Column(Integer, primary_key=True) >> user_id = Column(ForeignKey('user.id')) >> date = Column(Date) >> >> e = create_engine("postgresql://scott:tiger@localhost/test", echo=True) >> Base.metadata.drop_all(e) >> Base.metadata.create_all(e) >> >> s = Session(e) >> >> s.add_all([ >> User(id=i, messages=[ >> Messages(id=(i * 20) + j, date=datetime.date(2017, 3, j)) >> for j in range(1, 20) >> ]) for i in range(1, 51) >> ]) >> >> s.commit() >> >> top_ten_dates = set(datetime.date(2017, 3, j) for j in range(10, 20)) >> >> >> def run_test(q): >> all_u = q.all() >> assert len(all_u) == 50 >> for u in all_u: >> >> messages = u.messages >> assert len(messages) == 10 >> >> for m in messages: >> assert m.user_id == u.id >> >> received = set(m.date for m in messages) >> >> assert received == top_ten_dates >> >> # version 1. no LATERAL >> >> s.close() >> >> subq = s.query(Messages.date).\ >> filter(Messages.user_id == User.id).\ >> order_by(Messages.date.desc()).\ >> limit(1).offset(10).correlate(User).as_scalar() >> >> q = s.query(User).join( >> Messages, >> and_(User.id == Messages.user_id, Messages.date > subq) >> ).options(contains_eager(User.messages)) >> >> run_test(q) >> >> # version 2. LATERAL >> >> s.close() >> >> subq = s.query(Messages).\ >> filter(Messages.user_id == User.id).\ >> order_by(Messages.date.desc()).limit(10).subquery().lateral() >> >> q = s.query(User).outerjoin(subq).\ >> options(contains_eager(User.messages, alias=subq)) >> >> run_test(q) >> >> >> >> >> >>> >>> SO post here: http://stackoverflow.com/quest >>> ions/43727268/limit-child-collections-in-initial-query-sqlal >>> chemy/43727705?noredirect=1#comment74499641_43727705 >>> >>> -- >>> SQLAlchemy - >>> The Python SQL Toolkit and Object Relational Mapper >>> >>> http://www.sqlalchemy.org/ >>> >>> To post example code, please provide an MCVE: Minimal, Complete, and >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>> description. >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "sqlalchemy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to sqlalchemy+unsubscr...@googlegroups.com <mailto: >>> sqlalchemy+unsubscr...@googlegroups.com>. >>> To post to this group, send email to sqlalchemy@googlegroups.com >>> <mailto:sqlalchemy@googlegroups.com>. >>> Visit this group at https://groups.google.com/group/sqlalchemy. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper >> >> http://www.sqlalchemy.org/ >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> description. >> --- You received this message because you are subscribed to a topic in >> the Google Groups "sqlalchemy" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/to >> pic/sqlalchemy/lOedjqu61G4/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> sqlalchemy+unsubscr...@googlegroups.com. >> To post to this group, send email to sqlalchemy@googlegroups.com. >> Visit this group at https://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. >> > > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.