Re: [sqlalchemy] Combining yield_per and eager loading

Martijn van Oosterhout Fri, 15 Jul 2016 14:15:13 -0700

On 15 July 2016 at 18:46, Mike Bayer <mike...@zzzcomputing.com> wrote:



> Here's the problem that cant be solved:
>
> 1. fetch rows 0-1000
>
> 2. start loading Foo objects:
>
>      Foo(1) ->  eagerload Foo.related with 400 members
>      Foo(2) ->  eagerload Foo.related with 500 members
>      Foo(3) -> eagerload Foo.related with the first 100 of 250 members
>
> 3. yield.  That is, stop reading more rows.  Send out Foo(1), Foo(2),
> Foo(3).  The application now can go nuts with these.  It can read them,
> write them, iterate through .related.
>
> Right there, we're broken.  Foo(3) is out in the world with less than half
> of its ".related" collection - they are still sitting on the cursor!
>
>
Ok, I think I see what you're getting at but I don't think that a problem
here. It looks like you have a query which joins all the relationships in
one go. That's not what I'm doing, I'm relying on the fact that I can
interleave the queries. On SQL level it would look like so:

DECLARE my_cursor CURSOR FOR SELECT * FROM bigtable;
FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (1,2,3,4,5,....)
SELECT * FROM related_table2 WHERE bigtable_id IN (1,2,3,4,5,....)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (1,2,3,4,5,....)

FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (11,12,13,14,15,....)
SELECT * FROM related_table2 WHERE bigtable_id IN (11,12,13,14,15,....)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (11,12,13,14,15,....)

<... repeat ...>
(this may be a postgresql specific feature, not sure).

Yes, this may mean that some of the related objects may be fetched multiple
times, but this is not a problem for me (the whole thing is read-only
anyway). What I need however is for the secondary queries to populate the
relationships in the original BigTable objects.

Vastly simpler is to use the recommended window functions to do pagination
> of any scale with no chance of complex failures.
>
>
A bit of googling suggests you are referring to this:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
which indeed looks very neat and much easier than what I was thinking. It
doesn't say explicitly, but it looks like it will work transparently with
eager loading. It basically does the above, but skips the cursor and
replaces it with queries on ranges of the primary key (which is shorter and
probably more efficient to boot).

Thanks for the tip!
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Combining yield_per and eager loading

Reply via email to