Using ORM querying what is the best practice for limiting the output to a 
given number of resulting *entities*?

Consider this model:

from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
  
Base = declarative_base()
 
class Department(Base):
    __tablename__ = 'department'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    
    def __repr__(self):
        return f'Department({self.id}, {self.name})'

 
class Employee(Base):
    __tablename__ = 'employee'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    department_id = Column(Integer, ForeignKey('department.id'))
    # Use cascade='delete,all' to propagate the deletion of a Department 
onto its Employees
    department = relationship(
        Department,
        backref=backref('employees', uselist=True,
                         cascade='delete,all'))
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)
session = Session()
session.add(Department(id=1, name='IT'))
session.add(Department(id=2, name='Finance'))
session.add(Department(id=3, name='Sales'))
session.add(Employee(id=1, name='Victor', department_id=1))
session.add(Employee(id=2, name='Michal', department_id=1))
session.add(Employee(id=3, name='Kinga', department_id=2))
session.add(Employee(id=4, name='Andy', department_id=3))
session.commit()

Now, let's query for the list of Departments given some criteria on the 
Employee: 
session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 
'Kinga', 'Victor', 'Michal'])).all()

As expected we get:

[Department(1, IT), Department(2, Finance), Department(3, Sales)]

Now suppose our intent is to limit the number of results, and we would 
prefer to use a LIMIT clause to do the filtering on the database side:

session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 
'Kinga', 'Victor', 'Michal'])).limit(2).all()

This (in my case) resulted in:

[Department(1, IT)]

The reason for this is that the limit gets applied to the resulting *rows* from 
the joined tables, which happen to begin with the two employees from IT 
(this is non-deterministic unless order by is also used). Since both 
represent the same entity, only one instance is returned.

Other approaches (and their shortcomings) are:

   1. Using a DISTINCT clause prior to LIMIT - won't work if any field is 
   include non-comparable types like IMAGE, BLOB
   2. Using a subquery on the Department with LIMIT - this may filter out 
   departments, which would otherwise match

What works is wrapping the entire query in a subquery, but only selecting a 
DISTINCT Department.id field, and using that in the IN clause. It seems 
quite convoluted and some databases might not do great to optimize this 
away.

sq = session.query(Department.id).join(Employee).filter(Employee.name.in_([
'Andy', 'Kinga', 'Victor', 'Michal'])).distinct().limit(2).subquery()
session.query(Department).filter(Department.id.in_(sq)).all()


-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/d4e19b87-3442-4b5e-801a-3005fc4c433e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to