On Nov 22, 2010, at 5:17 PM, Yang Zhang wrote: > When reading objects like so (notice no `.all()`): > > for obj in Session().query(User): > ... > > what's the batch size with which sqlalchemy fetches rows from the DB? > (If the engine matters: for Postgresql, MySQL, and sqlite?)
postgresql: all rows are fetched by the DBAPI, unless the "server side cursors" mode is used, typically via execution_options(stream_results=True) mysql: all rows are fetched by the MySQLdb DBAPI, none are buffered by OurSQL, MySQL-connector-python has an option to control the behavior though SQLA has no fine-grained hooks into it at the moment (you may be able to set it manually as a connect_args option to create_engine()) pysqlite: not sure if it buffers all rows ahead of time, check their docs/mailing list Then: for obj in query: all rows are loaded into a result list of objects which is fully buffered until results are yielded, unless the yield_per() modifier method is used. So particularly with PG, if you use query.execution_options(stream_results=True).yield_per(100), you'll get streaming rows in chunks of 100, but with the other DBs things are more questionable (consult the API docs for details on those methods). Joined eager loading is not safe to use with yield_per() and subquery eager loading will defeat the purpose of yield_per(). What you should be getting from this is that you are better off using LIMIT in conjunction with criteria that can retrieve "windows" of objects. Using OFFSET is the most blunt way to do it, but performs poorly for large tables. For a web page thats paginating, I'll ensure there's always a deterministic ordering, and save the last displayed value, then the subsequent page uses that previous value in the WHERE clause as a starting point. For a batch job, if possible I might use a window function with PG and MS-SQL, to return windows of identifiers across the table: select id, row_number() over (order by id) as rownum where <criteria> and rownum % 100=1 that gives you a list of "id" values that are 100 rows apart. You can then use that to select groups: for index, prev_id in enumerate(ids): if index < len(ids) - 1: next_id = ids[index + 1] select * from table where id between prev_id and next_id else: select * from table where id > prev_id -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.