Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Ladislav Lenart Tue, 04 Jun 2013 07:46:39 -0700

Hello.

> assuming cls.foo is a many-to-one, it will produce the correct result,
> but will be far worse in terms of memory and performance, as the
> subqueryload() call will be invoked for each distinct batch of 50 rows,
> across the *full* result set.  So if your result has 1000 rows, and the
> number of "bars" total is 10000, you will load 10000 additional rows for
> *each* yield per of 50.


Ok, I think I get it. Is there a way to make it all work without the performance
penalty of subqueryload? For example, what will happen if I replace it with
joinedload(cls.bars)?

Ladislav Lenart


On 4.6.2013 16:19, Michael Bayer wrote:
> 
> On Jun 4, 2013, at 8:18 AM, Ladislav Lenart <lenart...@volny.cz> wrote:
> 
>> Hello.
>>
>> I have a hard time to understand the following comment for Query.yield_per():
> 
>>      Yield only ``count`` rows at a time.
>>
>>        WARNING: use this method with caution; if the same instance is present
>>        in more than one batch of rows, end-user changes to attributes will be
>>        overwritten.
>>
>>        In particular, it's usually impossible to use this setting with
>>        eagerly loaded collections (i.e. any lazy='joined' or 'subquery')
>>        since those collections will be cleared for a new load when
>>        encountered in a subsequent result batch.   In the case of 'subquery'
>>        loading, the full result for all rows is fetched which generally
>>        defeats the purpose of :meth:`~sqlalchemy.orm.query.Query.yield_per`.
>>
>>        Also note that many DBAPIs do not "stream" results, pre-buffering
>>        all rows before making them available, including mysql-python and
>>        psycopg2.  :meth:`~sqlalchemy.orm.query.Query.yield_per` will also
>>        set the ``stream_results`` execution
>>        option to ``True``, which currently is only understood by psycopg2
>>        and causes server side cursors to be used.
>>
>> Suppose I have a code like this:
>>
>>    q = session.query(cls).filter(...)
>>    q = q.options(
>>        joinedload(cls.foo),
>>      subqueryload(cls.bars),
>>    )
>>    for each in q.yield_per(50):
>>        # access each.foo or each.bars
>>
>> Does it work? Is so, why? If not, why?
> 
> assuming cls.foo is a many-to-one, it will produce the correct result, but 
> will be far worse in terms of memory and performance, as the subqueryload() 
> call will be invoked for each distinct batch of 50 rows, across the *full* 
> result set.  So if your result has 1000 rows, and the number of "bars" total 
> is 10000, you will load 10000 additional rows for *each* yield per of 50.
> 


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Reply via email to