On Tuesday, 23 February 2016 23:16:25 UTC+8, Mike Bayer wrote: > > > On 02/23/2016 04:00 AM, Daniel Kraus wrote: > > Here is a simple script to demonstrate the error: > > > > https://gist.github.com/dakra/0424086f5837d722bc58 > > the joinedload() case "works", as long as you don't use LIMIT or OFFSET, > as there's no subquery: > > SELECT SQL_CALC_FOUND_ROWS users.id AS users_id, users.name AS > users_name, addresses_1.id AS addresses_1_id, addresses_1.email_address > AS addresses_1_email_address, addresses_1.user_id AS addresses_1_user_id > FROM users LEFT OUTER JOIN addresses AS addresses_1 ON users.id = > addresses_1.user_id > > but looking at the docs for the purpose of found_rows, it would only be > used with a LIMIT. Therefore it's not really valid to use this > function with joined eager loading of a collection because it only works > at the top level of the query and a joined eager load is going to return > more rows than there are actual entities. If OTOH you are only > retrieving a many-to-one via joined eager load, this should all work > totally fine, and even in the case of LIMIT I don't think a subquery is > applied for simple many-to-one relationships. > > So subqueryload is the only practical option when you need collection > eager loading plus the found rows feature with limit. In this case you > definitely don't want this emitted in the subquery because even if it > were accepted it would mess up your found_rows(). Longer term solution > here would be to provide flags to the query.prefix_with() method to > indicate prefixes that should always move to the outside of the query as > well as prefixes that should not be passed along to subqueryloaders and > other transformations. > > Here's a found_rows modifier that will anticipate a subqueryload and > erase any _prefixes() nested: > > from sqlalchemy.orm.interfaces import MapperOption > from sqlalchemy.sql import visitors > > > class FoundRows(MapperOption): > def process_query(self, query): > query._prefixes = "SQL_CALC_FOUND_ROWS", > > def process_query_conditionally(self, query): > # when subqueryload calls upon loader options, it is passing > # the fully contructed query w/ the original query already > # embedded as a core select() object. So we will modify the > # select() after the fact. > > def visit_select(select): > select._prefixes = () > > # this can be more hardcoded, but here we're searching throughout > # all select() objects and erasing their _prefixes > for from_ in query._from_obj: > visitors.traverse( > from_, {}, {"select": visit_select}) > > > users = s.query(User).options(FoundRows(), > subqueryload(User.addresses)).limit(3).all() > row_count = s.execute('SELECT FOUND_ROWS()').scalar() > > print(users, row_count) > >
Thanks, this does remove the 'SQL_CALC_FOUND_ROWS' from the subquery, but when I query like you above with option subqueryload, sqlalchemy fires _2_ queries, the first one having SQL_CALC_FOUND_ROWS and the second one not, so s.execute('SELECT FOUND_ROWS()') only returns the found rows for that second query: --- cut --- [...INSERTs...] 2016-02-25 17:20:02,625 INFO sqlalchemy.engine.base.Engine SELECT SQL_CALC_FOUND_ROWS users.id AS users_id, users.name AS users_name FROM users LIMIT %(param_1)s 2016-02-25 17:20:02,625 INFO sqlalchemy.engine.base.Engine {'param_1': 1} 2016-02-25 17:20:02,627 INFO sqlalchemy.engine.base.Engine SELECT addresses.id AS addresses_id, addresses.email_address AS addresses_email_address, addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id FROM (SELECT users.id AS users_id FROM users LIMIT %(param_1)s) AS anon_1 INNER JOIN addresses ON anon_1.users_id = addresses.user_id ORDER BY anon_1.users_id 2016-02-25 17:20:02,627 INFO sqlalchemy.engine.base.Engine {'param_1': 1} 2016-02-25 17:20:02,628 INFO sqlalchemy.engine.base.Engine SELECT FOUND_ROWS() 2016-02-25 17:20:02,628 INFO sqlalchemy.engine.base.Engine {} [<__main__.User object at 0x7f66e19a61d0>] 2 --- cut --- But looking more into it, SQL_CALC_FOUND_ROWS seems to be slower in most cases anyway. See: https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ https://mariadb.atlassian.net/browse/MDEV-4592 (The related issue is open since 2009). So now generate the query and to get the row count I have: --- cut --- if model_query.statement._group_by_clause.clauses: # if there's a GROUP BY we count the slow way: # SELECT count(*) FROM (SELECT ... FROM Model ... ) row_count = model_query.limit(None).offset(None).order_by(None).count() else: # Remove limit, offset and order by from query and # SELECT count(DISTINCT Model.id) FROM Model ... count_query = model_query.statement.with_only_columns([func.count(distinct(Model.id))]) row_count = count_query.limit(None).offset(None).order_by(None).scalar() --- cut --- Thanks for your fast answer, Daniel > > --- cut --- > > from sqlalchemy import create_engine, Column, Integer, String, > ForeignKey > > from sqlalchemy.orm import Session, relationship, subqueryload, > joinedload > > from sqlalchemy.ext.declarative import declarative_base > > > > > > Base = declarative_base() > > > > e = create_engine("mysql+mysqlconnector://scott:tiger@localhost/test", > > echo=True) > > > > class Address(Base): > > __tablename__ = 'addresses' > > id = Column(Integer, primary_key=True) > > email_address = Column(String(64)) > > user_id = Column(Integer, ForeignKey('users.id')) > > > > > > class User(Base): > > __tablename__ = 'users' > > > > id = Column(Integer, primary_key=True) > > name = Column(String(64)) > > > > addresses = relationship(Address, backref="user") > > > > > > Base.metadata.drop_all(e) > > Base.metadata.create_all(e) > > > > s = Session(e) > > > > > > u = User(name='test') > > s.add_all([u, Address(email_address='email1', user=u), > > Address(email_address='email2', user=u)]) > > s.commit() > > > > # this works like expected > > users = s.query(User).prefix_with('SQL_CALC_FOUND_ROWS').all() > > row_count = s.execute('SELECT FOUND_ROWS()').scalar() > > > > print(users, row_count) > > > > # with eager loading (subqueryload or joinedload) it fails > > users = > > > s.query(User).prefix_with('SQL_CALC_FOUND_ROWS').options(subqueryload(User.addresses)).all() > > > > row_count = s.execute('SELECT FOUND_ROWS()').scalar() > > > > print(users, row_count) > > --- cut --- > > > > > > If I execute, the relevant error message is: > > sqlalchemy.exc.ProgrammingError: > > (mysql.connector.errors.ProgrammingError) 1234 (42000): Incorrect > > usage/placement of 'SQL_CALC_FOUND_ROWS' [SQL: 'SELECT addresses.id AS > > addresses_id, addresses.email_address AS addresses_email_address, > > addresses.user_id AS addresses_user_id, anon_1.users_id AS > > anon_1_users_id \nFROM (SELECT SQL_CALC_FOUND_ROWS users.id AS users_id > > \nFROM users) AS anon_1 INNER JOIN addresses ON anon_1.users_id = > > addresses.user_id ORDER BY anon_1.users_id'] > > > > Thanks, > > Daniel > > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.