On 07/15/2016 12:08 PM, Martijn van Oosterhout wrote:
Ok, so SQLAlchemy has this nice feature where you can eager load
relationships to significantly reduce the number of queries during
processing.

On the other hand, to reduce memory usage you can use yield_per() (on
Postgres) to significantly reduce the memory usage by not loading the
entire database in memory at once.

For very good reasons mentioned in the documentation you can't use both
of these in the same query, yet that is kind of my goal. What I'd like
to achieve, for a given query which goes over a big table:

while not end of resultset:
   take 1000 results
   eagerload all the relationships
   process them


Here's the problem that cant be solved:

1. fetch rows 0-1000

2. start loading Foo objects:

     Foo(1) ->  eagerload Foo.related with 400 members
     Foo(2) ->  eagerload Foo.related with 500 members
     Foo(3) -> eagerload Foo.related with the first 100 of 250 members

3. yield. That is, stop reading more rows. Send out Foo(1), Foo(2), Foo(3). The application now can go nuts with these. It can read them, write them, iterate through .related.

Right there, we're broken. Foo(3) is out in the world with less than half of its ".related" collection - they are still sitting on the cursor!

So let's try, don't actually yield Foo(3) until we've seen Foo(4), or the result has ended. That is, let's try to be clever and look at the next row to see that we're definitely done with Foo(3). This is completely complicated to do, but I have made it do this when I tried to make this work.

This is more insidious, because it will actually work most of the time. However, it won't work when we don't actually get objects in order like that. If Foo.related is a many-to-many, we could see a Bar in one yield batch, then see it all over again in another batch later. What if the user application did all kinds of things to that Bar() in the first place, now we're seeing it again, but the identity map has already been garbage collected. Now that Bar() is stale *before it ever left its result set*, not to mention if that Bar() has more eagerloaded collections, they're all wrong too! It's a disaster.

Vastly simpler is to use the recommended window functions to do pagination of any scale with no chance of complex failures.







Now, the eager loading part is posing difficulties (or I'm not reading
the documentation carefully enough). I found the
|attributes.set_committed_value()|
<http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.attributes.set_committed_value>
function which solves half the problem, but I still need to generate the
actual query to return the necessary objects. So perhaps (pseudo-code):

|
defeagerload_for_set(object_list,relationship)
   ids =set(o.get(relationship.left_id)foro inobject_list)
   lookup
=Query(relationship.right_table).filter_by(relationship.right_column.in_(ids)).all()
   foro inobject_list:

 o.set_committed_value(o,relationship.left,lookup[relationship.left_id])
|

Before I go diving into the SQLAlchemy to make the above actually work,
does it seem reasonable? Are there are handy utils somewhere that might
help?

Thanks for any ideas,

Have a nice day,

--
Martijn van Oosterhout



--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
<mailto:sqlalchemy+unsubscr...@googlegroups.com>.
To post to this group, send email to sqlalchemy@googlegroups.com
<mailto:sqlalchemy@googlegroups.com>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to