Re: [sqlalchemy] Quick question on fetch batch size

Michael Bayer Mon, 22 Nov 2010 14:50:04 -0800

On Nov 22, 2010, at 5:17 PM, Yang Zhang wrote:

> When reading objects like so (notice no `.all()`):
> 
>  for obj in Session().query(User):
>    ...
> 
> what's the batch size with which sqlalchemy fetches rows from the DB?
> (If the engine matters: for Postgresql, MySQL, and sqlite?)


postgresql:   all rows are fetched by the DBAPI, unless the "server side 
cursors" mode is used, typically via execution_options(stream_results=True)
mysql: all rows are fetched by the MySQLdb DBAPI, none are buffered by OurSQL, 
MySQL-connector-python has an option to control the behavior though SQLA has no 
fine-grained hooks into it at the moment (you may be able to set it manually as 
a connect_args option to create_engine())
pysqlite:  not sure if it buffers all rows ahead of time, check their 
docs/mailing list

Then:

        for obj in query:

all rows are loaded into a result list of objects which is fully buffered until 
results are yielded, unless the yield_per() modifier method is used.    So 
particularly with PG, if you use 
query.execution_options(stream_results=True).yield_per(100), you'll get 
streaming rows in chunks of 100, but with the other DBs things are more 
questionable (consult the API docs for details on those methods).   Joined 
eager loading is not safe to use with yield_per() and subquery eager loading 
will defeat the purpose of yield_per().

What you should be getting from this is that you are better off using LIMIT in 
conjunction with criteria that can retrieve "windows" of objects. Using OFFSET 
is the most blunt way to do it, but performs poorly for large tables.   For a 
web page thats paginating, I'll ensure there's always a deterministic ordering, 
and save the last displayed value, then the subsequent page uses that previous 
value in the WHERE clause as a starting point.   For a batch job, if possible I 
might use a window function with PG and MS-SQL, to return windows of 
identifiers across the table:

        select id, row_number() over (order by id) as rownum where <criteria> 
and rownum % 100=1

that gives you a list of "id" values that are 100 rows apart.  You can then use 
that to select groups:

        for index, prev_id in enumerate(ids):
                if index < len(ids) - 1:
                        next_id = ids[index + 1]
                        select * from table where id between prev_id and  
next_id
                else:
                        select * from table where id > prev_id

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Quick question on fetch batch size

Reply via email to