Re: [sqlalchemy] Re: Working with large IN lists

Michael Bayer Thu, 23 Feb 2012 05:51:06 -0800

I'd note that those subqueryloads() render the effectiveness of yield_per() to 
be almost nil.

This is how subqueryload works:

query(Parent).options(subqueryload(Parent.child)).all()

will give you:

"SELECT * FROM parent"

then, the moment a "parent" row is fetched, the first "child" collection is 
referenced to be loaded, then the query emitted is:

"SELECT * FROM child JOIN (SELECT * FROM parent) AS a on a.id=child.id_a"

that is, the second query loads all child rows for all parents in the entire 
result.   

So let's say we do yield_per(), so that SQLAlchemy only processes the first 100 
rows before handing them out.   As soon as you hit either of those two 
subqueryloaded collections, the yield_per() is mostly thrown out the window - 
they will each load the entire list of child objects for the whole result, 
which right there will grow memory to be as big as your entire result.

The next thing I'd look at is that second query for ModelC.   You can also load 
those upfront so that you don't need to do a query each time:

modelcs = dict(
    sess.query(ModelC.id_a, 
ModelC).join(ModelC.a).filter(ModelA.parent_id=<same integer you're using 
against ModelA>)
)

then you have a dictionary of "id_a->ModelC" as you loop through your ModelA 
records.   All the ModelC's are already in the session, so you don't need to 
use merge(), which is also not a super quick operation.  If an "id_a" is not in 
the dictionary then you know to create a new ModelC and use Session.add() to 
put it in.   If you're limiting the ModelA rows using a LIMIT or window recipe 
like I mentioned earlier, you'd apply that same criteria to the loading of the 
"modelcs".

Later on you're doing something with query(ModelC).first() in a loop which is 
also something I hope isn't in the real application - looking there is seems 
like you'd only need to say query(ModelC).delete().

Watching the SQL emitted with echo=True, and in some cases also seeing how 
large the results coming in are using echo='debug', is something I strongly 
recommend when first profiling an application.

On Feb 23, 2012, at 8:18 AM, Vlad K. wrote:

> 
> And yet again the problem is not in SQLAlchemy but in Pyramid.... Sorry, from 
> now on, I'll first try writing test scripts without Pyramid to see where the 
> problem is, I just don't have time for that kind of debugging so I'm wasting 
> yours. :)
> 
> 
> Thanks. I'll write to Pylons Discuss list, but the test script is here 
> nevertheless. Comment out line 266 (and remove imports) to disable Pyramid 
> and see it work okay. Without Pyramid (but even with Transaction and 
> ZopeTransactionExtension), the gcdelta is 0 or negative per batch of 200 
> iterations. With Pyramid bootstrapped, gcdelta is in thousands per batch of 
> 200 iterations.
> 
> https://gist.github.com/d669e958c54869c69831
> 
> 
> .oO V Oo.
> 
> 
> On 02/23/2012 02:41 AM, Michael Bayer wrote:
>> On Feb 22, 2012, at 6:36 PM, Vlad K. wrote:
>> 
>>> Okay, thanks to this article:
>>> 
>>> http://neverfear.org/blog/view/155/Investigating_memory_leaks_in_Python
>>> 
>>> 
>>> I made similar plot of object counts in time, showing top 50 types. The 
>>> resulting PDF is here (you might wish to download it first, Google messes 
>>> it up for me):
>>> 
>>> https://docs.google.com/open?id=0ByLiBlA59qDwYTY1MGIzYWEtYjMxZi00ZDVlLTk0OTEtOGI2ZjA3NDgyM2Y3
>>> 
>>> 
>>> Everything seems to linearly grow in count. Something is keeping all those 
>>> objects reference somewhere. What could possibly be the cause?
>> 
>> can you provide a self-contained, single file test case that illustrates the 
>> memory growth ?
>> 
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to