Re: [sqlalchemy] Re: Working with large IN lists

Michael Bayer Wed, 22 Feb 2012 10:23:50 -0800

When we want to test if a Python program has a "leak", we do that via seeing 
how many uncollected objects are present.   This is done via gc:

import gc
print "total number of objects:", len(gc.get_objects())

That's the only real way to measure if the memory used by Python objects is 
growing unbounded.  Looking at the memory usage on "top" shows what the 
interpreter takes up - the CPython interpreter in more modern releases does 
release memory back, but only occasionally.   Older versions don't.

If you're doing an operation that loads thousands of rows, those rows are 
virtually always loaded entirely into memory by the DBAPI, before your program 
or SQLAlchemy is ever given the chance to fetch a single row.   I haven't yet 
looked closely at your case here, but that's often at the core of scripts that 
use much more memory than expected.

There's ways to get *some* DBAPIs to not do this (particularly psycopg2, if 
you're using Postgresql, see 
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
 and 
http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=stream_results#sqlalchemy.engine.base.Connection.execution_options),
  though the better solution is to usually try loading chunks of records in at 
a time (one such recipe that I use for this is here: 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery) .  Or 
better yet consider if the problem can be solved entirely on the SQL side (this 
entirely depends on exactly what you're trying to do with the data in question).

On Feb 22, 2012, at 9:46 AM, Vlad K. wrote:

> 
> Okay, after several test cases, various join combinations with or without 
> relationships, with or without cherrypicking columns that are really used 
> from the joined models, I've come to the conclusion that the only problem I'm 
> having here is that there is no garbage collection. Python memory use just 
> keeps growing at a rate that, of course, depends on the size of models used 
> and data queried, but it just keeps growing, regardless of release/deletion 
> of instances or isolating one row processing in its own committed transaction.
> 
> I also found this:
> 
> http://permalink.gmane.org/gmane.comp.python.sqlalchemy.user/30087
> 
> 
> So it appears I'm having the same problem.
> 
> 
> Am I understanding correctly that because of this, SQLAlchemy ORM is in my 
> case useless if I have to process thousands of rows, because the memory used 
> to process each row (along with corresponding joined models etc...) will not 
> be released? So basically I'd have to use SQLA without the ORM, for this 
> particular use case?
> 
> Or is this some memory leak bug?
> 
> If so, any suggestions, examples on how do I switch from ORM use to non-ORM 
> if I want to retain the named tuples returned by queries and avoid rewriting 
> half the app?
> 
> 
> Thanks.
> 
> 
> .oO V Oo.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to