I would first figure out how you would do this in SQL, and then translate that to SQLAlchemy. In this case, the EXISTS operator might work:
SELECT * FROM department WHERE EXISTS ( SELECT 1 FROM employee WHERE employee.department_id = department.id AND employee.name IN (...) ) LIMIT 2 The docs contain some examples of how to use EXISTS: https://docs.sqlalchemy.org/en/13/orm/tutorial.html#using-exists session.query(Department).filter(Department.employees.any(Employee.name.in_(...))) I've no idea about the performance of this though. It seems like it's going to have to evaluate the subquery for every row of the parent table. If the parent table is huge, and you're only looking at a small number of employees, that might be wasted effort. Under those conditions, it would be better to find the employees first, then select the matching departments: SELECT * FROM department WHERE id in ( SELECT department_id FROM employee WHERE name IN (...) ) LIMIT 2 employee_subquery = session.query(Employee.department_id).filter(Employee.name.in_(...)).subquery() departments = session.query(Department).filter(Department.id.in_(employee_subquery)).limit(2) Simon On Thu, Jul 4, 2019 at 10:27 AM Victor Olex <victor.o...@vtenterprise.com> wrote: > > Using ORM querying what is the best practice for limiting the output to a > given number of resulting entities? > > Consider this model: > > from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func > from sqlalchemy.orm import relationship, backref > from sqlalchemy.ext.declarative import declarative_base > from sqlalchemy import create_engine > from sqlalchemy.orm import sessionmaker > > Base = declarative_base() > > class Department(Base): > __tablename__ = 'department' > id = Column(Integer, primary_key=True) > name = Column(String) > > def __repr__(self): > return f'Department({self.id}, {self.name})' > > > class Employee(Base): > __tablename__ = 'employee' > id = Column(Integer, primary_key=True) > name = Column(String) > department_id = Column(Integer, ForeignKey('department.id')) > # Use cascade='delete,all' to propagate the deletion of a Department onto > its Employees > department = relationship( > Department, > backref=backref('employees', uselist=True, > cascade='delete,all')) > engine = create_engine('sqlite://', echo=True) > Base.metadata.create_all(engine) > > Session = sessionmaker(bind=engine) > session = Session() > session.add(Department(id=1, name='IT')) > session.add(Department(id=2, name='Finance')) > session.add(Department(id=3, name='Sales')) > session.add(Employee(id=1, name='Victor', department_id=1)) > session.add(Employee(id=2, name='Michal', department_id=1)) > session.add(Employee(id=3, name='Kinga', department_id=2)) > session.add(Employee(id=4, name='Andy', department_id=3)) > session.commit() > > Now, let's query for the list of Departments given some criteria on the > Employee: > session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', > 'Kinga', 'Victor', 'Michal'])).all() > > As expected we get: > > [Department(1, IT), Department(2, Finance), Department(3, Sales)] > > Now suppose our intent is to limit the number of results, and we would prefer > to use a LIMIT clause to do the filtering on the database side: > > session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', > 'Kinga', 'Victor', 'Michal'])).limit(2).all() > > This (in my case) resulted in: > > [Department(1, IT)] > > The reason for this is that the limit gets applied to the resulting rows from > the joined tables, which happen to begin with the two employees from IT (this > is non-deterministic unless order by is also used). Since both represent the > same entity, only one instance is returned. > > Other approaches (and their shortcomings) are: > > Using a DISTINCT clause prior to LIMIT - won't work if any field is include > non-comparable types like IMAGE, BLOB > Using a subquery on the Department with LIMIT - this may filter out > departments, which would otherwise match > > What works is wrapping the entire query in a subquery, but only selecting a > DISTINCT Department.id field, and using that in the IN clause. It seems quite > convoluted and some databases might not do great to optimize this away. > > sq = > session.query(Department.id).join(Employee).filter(Employee.name.in_(['Andy', > 'Kinga', 'Victor', 'Michal'])).distinct().limit(2).subquery() > session.query(Department).filter(Department.id.in_(sq)).all() > > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/d4e19b87-3442-4b5e-801a-3005fc4c433e%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CAFHwexeW9-ApvF2uPEzLbQhwUPLBzNwKvLDqK7OBb7tM4mZX5Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.