Using ORM querying what is the best practice for limiting the output to a given number of resulting *entities*?
Consider this model: from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker Base = declarative_base() class Department(Base): __tablename__ = 'department' id = Column(Integer, primary_key=True) name = Column(String) def __repr__(self): return f'Department({self.id}, {self.name})' class Employee(Base): __tablename__ = 'employee' id = Column(Integer, primary_key=True) name = Column(String) department_id = Column(Integer, ForeignKey('department.id')) # Use cascade='delete,all' to propagate the deletion of a Department onto its Employees department = relationship( Department, backref=backref('employees', uselist=True, cascade='delete,all')) engine = create_engine('sqlite://', echo=True) Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() session.add(Department(id=1, name='IT')) session.add(Department(id=2, name='Finance')) session.add(Department(id=3, name='Sales')) session.add(Employee(id=1, name='Victor', department_id=1)) session.add(Employee(id=2, name='Michal', department_id=1)) session.add(Employee(id=3, name='Kinga', department_id=2)) session.add(Employee(id=4, name='Andy', department_id=3)) session.commit() Now, let's query for the list of Departments given some criteria on the Employee: session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 'Kinga', 'Victor', 'Michal'])).all() As expected we get: [Department(1, IT), Department(2, Finance), Department(3, Sales)] Now suppose our intent is to limit the number of results, and we would prefer to use a LIMIT clause to do the filtering on the database side: session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 'Kinga', 'Victor', 'Michal'])).limit(2).all() This (in my case) resulted in: [Department(1, IT)] The reason for this is that the limit gets applied to the resulting *rows* from the joined tables, which happen to begin with the two employees from IT (this is non-deterministic unless order by is also used). Since both represent the same entity, only one instance is returned. Other approaches (and their shortcomings) are: 1. Using a DISTINCT clause prior to LIMIT - won't work if any field is include non-comparable types like IMAGE, BLOB 2. Using a subquery on the Department with LIMIT - this may filter out departments, which would otherwise match What works is wrapping the entire query in a subquery, but only selecting a DISTINCT Department.id field, and using that in the IN clause. It seems quite convoluted and some databases might not do great to optimize this away. sq = session.query(Department.id).join(Employee).filter(Employee.name.in_([ 'Andy', 'Kinga', 'Victor', 'Michal'])).distinct().limit(2).subquery() session.query(Department).filter(Department.id.in_(sq)).all() -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/d4e19b87-3442-4b5e-801a-3005fc4c433e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.