Re: [sqlalchemy] limit in context of entities not rows

Simon King Thu, 04 Jul 2019 03:20:40 -0700

I would first figure out how you would do this in SQL, and then
translate that to SQLAlchemy. In this case, the EXISTS operator might
work:


SELECT *
FROM department
WHERE EXISTS (
    SELECT 1
    FROM employee
    WHERE employee.department_id = department.id
    AND employee.name IN (...)
)
LIMIT 2

The docs contain some examples of how to use EXISTS:

https://docs.sqlalchemy.org/en/13/orm/tutorial.html#using-exists

session.query(Department).filter(Department.employees.any(Employee.name.in_(...)))

I've no idea about the performance of this though. It seems like it's
going to have to evaluate the subquery for every row of the parent
table. If the parent table is huge, and you're only looking at a small
number of employees, that might be wasted effort. Under those
conditions, it would be better to find the employees first, then
select the matching departments:

SELECT *
FROM department
WHERE id in (
    SELECT department_id
    FROM employee
    WHERE name IN (...)
)
LIMIT 2

employee_subquery =
session.query(Employee.department_id).filter(Employee.name.in_(...)).subquery()
departments = 
session.query(Department).filter(Department.id.in_(employee_subquery)).limit(2)

Simon

On Thu, Jul 4, 2019 at 10:27 AM Victor Olex
<victor.o...@vtenterprise.com> wrote:
>
> Using ORM querying what is the best practice for limiting the output to a 
> given number of resulting entities?
>
> Consider this model:
>
> from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, func
> from sqlalchemy.orm import relationship, backref
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy import create_engine
> from sqlalchemy.orm import sessionmaker
>
> Base = declarative_base()
>
> class Department(Base):
>     __tablename__ = 'department'
>     id = Column(Integer, primary_key=True)
>     name = Column(String)
>
>     def __repr__(self):
>         return f'Department({self.id}, {self.name})'
>
>
> class Employee(Base):
>     __tablename__ = 'employee'
>     id = Column(Integer, primary_key=True)
>     name = Column(String)
>     department_id = Column(Integer, ForeignKey('department.id'))
>     # Use cascade='delete,all' to propagate the deletion of a Department onto 
> its Employees
>     department = relationship(
>         Department,
>         backref=backref('employees', uselist=True,
>                          cascade='delete,all'))
> engine = create_engine('sqlite://', echo=True)
> Base.metadata.create_all(engine)
>
> Session = sessionmaker(bind=engine)
> session = Session()
> session.add(Department(id=1, name='IT'))
> session.add(Department(id=2, name='Finance'))
> session.add(Department(id=3, name='Sales'))
> session.add(Employee(id=1, name='Victor', department_id=1))
> session.add(Employee(id=2, name='Michal', department_id=1))
> session.add(Employee(id=3, name='Kinga', department_id=2))
> session.add(Employee(id=4, name='Andy', department_id=3))
> session.commit()
>
> Now, let's query for the list of Departments given some criteria on the 
> Employee:
> session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 
> 'Kinga', 'Victor', 'Michal'])).all()
>
> As expected we get:
>
> [Department(1, IT), Department(2, Finance), Department(3, Sales)]
>
> Now suppose our intent is to limit the number of results, and we would prefer 
> to use a LIMIT clause to do the filtering on the database side:
>
> session.query(Department).join(Employee).filter(Employee.name.in_(['Andy', 
> 'Kinga', 'Victor', 'Michal'])).limit(2).all()
>
> This (in my case) resulted in:
>
> [Department(1, IT)]
>
> The reason for this is that the limit gets applied to the resulting rows from 
> the joined tables, which happen to begin with the two employees from IT (this 
> is non-deterministic unless order by is also used). Since both represent the 
> same entity, only one instance is returned.
>
> Other approaches (and their shortcomings) are:
>
> Using a DISTINCT clause prior to LIMIT - won't work if any field is include 
> non-comparable types like IMAGE, BLOB
> Using a subquery on the Department with LIMIT - this may filter out 
> departments, which would otherwise match
>
> What works is wrapping the entire query in a subquery, but only selecting a 
> DISTINCT Department.id field, and using that in the IN clause. It seems quite 
> convoluted and some databases might not do great to optimize this away.
>
> sq = 
> session.query(Department.id).join(Employee).filter(Employee.name.in_(['Andy', 
> 'Kinga', 'Victor', 'Michal'])).distinct().limit(2).subquery()
> session.query(Department).filter(Department.id.in_(sq)).all()
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/d4e19b87-3442-4b5e-801a-3005fc4c433e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CAFHwexeW9-ApvF2uPEzLbQhwUPLBzNwKvLDqK7OBb7tM4mZX5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] limit in context of entities not rows

Reply via email to