Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Ladislav Lenart Tue, 04 Jun 2013 08:41:50 -0700

Hello.

> You will then get the wrong results.   The docstring tries to explain this -
> a joinedload uses a JOIN.  For each "cls" instance, there are many rows, one
> for each "bar".  If you cut off the results in the middle of populating that
> collection, the collection is incomplete, you'll see the wrong collection on
> your cls.bars.  On the next load, cls.bars will be wiped out and populated
> with the remaining "bar" objects.


Ok, I think I understand this too.

I've tried WindowedRangeQuery. It looked promising at first but it is (much)
slower than yield_per() with all its quirks, at least for my usecase. If I
understand the WindowedRangeQuery recipe, it does a full scan of the target
table first to read all the ids and calculate the bounds of all the windows. I
don't want to it like this. I am working with relatively large datasets but it
is still far less than all rows in the table. Something like 10-50000 rows from
a table with 1-2 million rows. The windowed query iterates over many completely
empty windows.

Can I modify the recipe so it preserves the filtering and creates windows only
for the interesting subset of the table?

Thank you,

Ladislav Lenart


On 4.6.2013 17:18, Michael Bayer wrote:
> 
> On Jun 4, 2013, at 10:45 AM, Ladislav Lenart <lenart...@volny.cz> wrote:
> 
>> Hello.
>>
>>> assuming cls.foo is a many-to-one, it will produce the correct result,
>>> but will be far worse in terms of memory and performance, as the
>>> subqueryload() call will be invoked for each distinct batch of 50 rows,
>>> across the *full* result set.  So if your result has 1000 rows, and the
>>> number of "bars" total is 10000, you will load 10000 additional rows for
>>> *each* yield per of 50.
>>
>> Ok, I think I get it. Is there a way to make it all work without the 
>> performance
>> penalty of subqueryload? For example, what will happen if I replace it with
>> joinedload(cls.bars)?
> 
> You will then get the wrong results.   The docstring tries to explain this - 
> a joinedload uses a JOIN.  For each "cls" instance, there are many rows, one 
> for each "bar".  If you cut off the results in the middle of populating that 
> collection, the collection is incomplete, you'll see the wrong collection on 
> your cls.bars.  On the next load, cls.bars will be wiped out and populated 
> with the remaining "bar" objects.
> 
> Don't use yield_per.  Use windowing instead, see 
> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Reply via email to