Re: [sqlalchemy] prefix_with for queries with that eager load relationships

Daniel Kraus Thu, 25 Feb 2016 01:37:53 -0800


On Tuesday, 23 February 2016 23:16:25 UTC+8, Mike Bayer wrote:
>
>
> On 02/23/2016 04:00 AM, Daniel Kraus wrote: 
> > Here is a simple script to demonstrate the error: 
> > 
> > https://gist.github.com/dakra/0424086f5837d722bc58 
>
> the joinedload() case "works", as long as you don't use LIMIT or OFFSET, 
> as there's no subquery: 
>
> SELECT SQL_CALC_FOUND_ROWS users.id AS users_id, users.name AS 
> users_name, addresses_1.id AS addresses_1_id, addresses_1.email_address 
> AS addresses_1_email_address, addresses_1.user_id AS addresses_1_user_id 
> FROM users LEFT OUTER JOIN addresses AS addresses_1 ON users.id = 
> addresses_1.user_id 
>
> but looking at the docs for the purpose of found_rows, it would only be 
> used with a LIMIT.   Therefore it's not really valid to use this 
> function with joined eager loading of a collection because it only works 
> at the top level of the query and a joined eager load is going to return 
> more rows than there are actual entities.   If OTOH you are only 
> retrieving a many-to-one via joined eager load, this should all work 
> totally fine, and even in the case of LIMIT I don't think a subquery is 
> applied for simple many-to-one relationships. 
>
> So subqueryload is the only practical option when you need collection 
> eager loading plus the found rows feature with limit.   In this case you 
> definitely don't want this emitted in the subquery because even if it 
> were accepted it would mess up your found_rows().   Longer term solution 
> here would be to provide flags to the query.prefix_with() method to 
> indicate prefixes that should always move to the outside of the query as 
> well as prefixes that should not be passed along to subqueryloaders and 
> other transformations. 
>
> Here's a found_rows modifier that will anticipate a subqueryload and 
> erase any _prefixes() nested: 
>
> from sqlalchemy.orm.interfaces import MapperOption 
> from sqlalchemy.sql import visitors 
>
>
> class FoundRows(MapperOption): 
>      def process_query(self, query): 
>          query._prefixes = "SQL_CALC_FOUND_ROWS", 
>
>      def process_query_conditionally(self, query): 
>          # when subqueryload calls upon loader options, it is passing 
>          # the fully contructed query w/ the original query already 
>          # embedded as a core select() object.  So we will modify the 
>          # select() after the fact. 
>
>          def visit_select(select): 
>              select._prefixes = () 
>
>          # this can be more hardcoded, but here we're searching throughout 
>          # all select() objects and erasing their _prefixes 
>          for from_ in query._from_obj: 
>              visitors.traverse( 
>                  from_, {}, {"select": visit_select}) 
>
>
> users = s.query(User).options(FoundRows(), 
> subqueryload(User.addresses)).limit(3).all() 
> row_count = s.execute('SELECT FOUND_ROWS()').scalar() 
>
> print(users, row_count) 
>
>


Thanks, this does remove the 'SQL_CALC_FOUND_ROWS' from the subquery,
but when I query like you above with option subqueryload, sqlalchemy fires 
_2_ queries,
the first one having SQL_CALC_FOUND_ROWS and the second one not, so
s.execute('SELECT FOUND_ROWS()') only returns the found rows for that 
second query:

--- cut ---
[...INSERTs...]
2016-02-25 17:20:02,625 INFO sqlalchemy.engine.base.Engine SELECT 
SQL_CALC_FOUND_ROWS users.id AS users_id, users.name AS users_name 
FROM users 
 LIMIT %(param_1)s
2016-02-25 17:20:02,625 INFO sqlalchemy.engine.base.Engine {'param_1': 1}
2016-02-25 17:20:02,627 INFO sqlalchemy.engine.base.Engine SELECT 
addresses.id AS addresses_id, addresses.email_address AS 
addresses_email_address, addresses.user_id AS addresses_user_id, 
anon_1.users_id AS anon_1_users_id 
FROM (SELECT users.id AS users_id 
FROM users 
 LIMIT %(param_1)s) AS anon_1 INNER JOIN addresses ON anon_1.users_id = 
addresses.user_id ORDER BY anon_1.users_id
2016-02-25 17:20:02,627 INFO sqlalchemy.engine.base.Engine {'param_1': 1}
2016-02-25 17:20:02,628 INFO sqlalchemy.engine.base.Engine SELECT 
FOUND_ROWS()
2016-02-25 17:20:02,628 INFO sqlalchemy.engine.base.Engine {}
[<__main__.User object at 0x7f66e19a61d0>] 2
--- cut ---


But looking more into it, SQL_CALC_FOUND_ROWS seems to be slower in most 
cases anyway. See:
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
https://mariadb.atlassian.net/browse/MDEV-4592  (The related issue is open 
since 2009).


So now generate the query and to get the row count I have:
--- cut ---
    if model_query.statement._group_by_clause.clauses:
        # if there's a GROUP BY we count the slow way:
        # SELECT count(*) FROM (SELECT ... FROM Model ... )
        row_count = 
model_query.limit(None).offset(None).order_by(None).count()
    else:
        # Remove limit, offset and order by from query and
        # SELECT count(DISTINCT Model.id) FROM Model ...
        count_query = 
model_query.statement.with_only_columns([func.count(distinct(Model.id))])
        row_count = 
count_query.limit(None).offset(None).order_by(None).scalar()
--- cut ---


Thanks for your fast answer,
  Daniel



> > --- cut --- 
> > from sqlalchemy import create_engine, Column, Integer, String, 
> ForeignKey 
> > from sqlalchemy.orm import Session, relationship, subqueryload, 
> joinedload 
> > from sqlalchemy.ext.declarative import declarative_base 
> > 
> > 
> > Base = declarative_base() 
> > 
> > e = create_engine("mysql+mysqlconnector://scott:tiger@localhost/test", 
> > echo=True) 
> > 
> > class Address(Base): 
> >      __tablename__ = 'addresses' 
> >      id = Column(Integer, primary_key=True) 
> >      email_address = Column(String(64)) 
> >      user_id = Column(Integer, ForeignKey('users.id')) 
> > 
> > 
> > class User(Base): 
> >      __tablename__ = 'users' 
> > 
> >      id = Column(Integer, primary_key=True) 
> >      name = Column(String(64)) 
> > 
> >      addresses = relationship(Address, backref="user") 
> > 
> > 
> > Base.metadata.drop_all(e) 
> > Base.metadata.create_all(e) 
> > 
> > s = Session(e) 
> > 
> > 
> > u = User(name='test') 
> > s.add_all([u, Address(email_address='email1', user=u), 
> > Address(email_address='email2', user=u)]) 
> > s.commit() 
> > 
> > # this works like expected 
> > users = s.query(User).prefix_with('SQL_CALC_FOUND_ROWS').all() 
> > row_count = s.execute('SELECT FOUND_ROWS()').scalar() 
> > 
> > print(users, row_count) 
> > 
> > # with eager loading (subqueryload or joinedload) it fails 
> > users = 
> > 
> s.query(User).prefix_with('SQL_CALC_FOUND_ROWS').options(subqueryload(User.addresses)).all()
>  
>
> > row_count = s.execute('SELECT FOUND_ROWS()').scalar() 
> > 
> > print(users, row_count) 
> > --- cut --- 
> > 
> > 
> > If I execute, the relevant error message is: 
> > sqlalchemy.exc.ProgrammingError: 
> > (mysql.connector.errors.ProgrammingError) 1234 (42000): Incorrect 
> > usage/placement of 'SQL_CALC_FOUND_ROWS' [SQL: 'SELECT addresses.id AS 
> > addresses_id, addresses.email_address AS addresses_email_address, 
> > addresses.user_id AS addresses_user_id, anon_1.users_id AS 
> > anon_1_users_id \nFROM (SELECT SQL_CALC_FOUND_ROWS users.id AS users_id 
> > \nFROM users) AS anon_1 INNER JOIN addresses ON anon_1.users_id = 
> > addresses.user_id ORDER BY anon_1.users_id'] 
> > 
> > Thanks, 
> >    Daniel 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] prefix_with for queries with that eager load relationships

Reply via email to