Re: [sqlalchemy] Re: Working with large IN lists

Vlad K. Wed, 22 Feb 2012 11:13:08 -0800

Hi,

thanks for your reply. I haven't yet tested this with a profiler to seeexactly what exactly is happening, but the bottom line is that theoverall memory use grows with each iteration (or transaction processed),to the point of grinding the server to a halt, and top shows only thePython process involved consuming all the memory.

I've already modified code to read one row at a time, by first creatinga list of IDs to be affected, then going through that list and selecting+ updating/inserting one transaction at a time.

I suppose I can solve the problem entirely on the SQL side with a storedfunction but that's a maintenance overhead I'd like to avoid if possible.

Meanwhile I've gotten rid of "convenience" relationships and in someaspects decided on lazy=select instead of subquery or joined and havebrought down total memory use, now the entire process can finish withthe amount of RAM available on the server, but it still shows lineargrowth from the start to the end of the process.


.oO V Oo.


On 02/22/2012 07:23 PM, Michael Bayer wrote:

When we want to test if a Python program has a "leak", we do that via seeing
how many uncollected objects are present. This is done via gc:

import gc
print "total number of objects:", len(gc.get_objects())

That's the only real way to measure if the memory used by Python objects is growing
unbounded. Looking at the memory usage on "top" shows what the interpreter
takes up - the CPython interpreter in more modern releases does release memory back, but
only occasionally. Older versions don't.

If you're doing an operation that loads thousands of rows, those rows are
virtually always loaded entirely into memory by the DBAPI, before your program
or SQLAlchemy is ever given the chance to fetch a single row. I haven't yet
looked closely at your case here, but that's often at the core of scripts that
use much more memory than expected.

There's ways to get *some* DBAPIs to not do this (particularly psycopg2, if
you're using Postgresql, see
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
and
http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=stream_results#sqlalchemy.engine.base.Connection.execution_options),
though the better solution is to usually try loading chunks of records in at
a time (one such recipe that I use for this is here:
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery) . Or
better yet consider if the problem can be solved entirely on the SQL side (this
entirely depends on exactly what you're trying to do with the data in question).


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to