Hi,

thanks for your reply. I haven't yet tested this with a profiler to see exactly what exactly is happening, but the bottom line is that the overall memory use grows with each iteration (or transaction processed), to the point of grinding the server to a halt, and top shows only the Python process involved consuming all the memory.

I've already modified code to read one row at a time, by first creating a list of IDs to be affected, then going through that list and selecting + updating/inserting one transaction at a time.

I suppose I can solve the problem entirely on the SQL side with a stored function but that's a maintenance overhead I'd like to avoid if possible.

Meanwhile I've gotten rid of "convenience" relationships and in some aspects decided on lazy=select instead of subquery or joined and have brought down total memory use, now the entire process can finish with the amount of RAM available on the server, but it still shows linear growth from the start to the end of the process.

.oO V Oo.


On 02/22/2012 07:23 PM, Michael Bayer wrote:
When we want to test if a Python program has a "leak", we do that via seeing 
how many uncollected objects are present.   This is done via gc:

import gc
print "total number of objects:", len(gc.get_objects())

That's the only real way to measure if the memory used by Python objects is growing 
unbounded.  Looking at the memory usage on "top" shows what the interpreter 
takes up - the CPython interpreter in more modern releases does release memory back, but 
only occasionally.   Older versions don't.

If you're doing an operation that loads thousands of rows, those rows are 
virtually always loaded entirely into memory by the DBAPI, before your program 
or SQLAlchemy is ever given the chance to fetch a single row.   I haven't yet 
looked closely at your case here, but that's often at the core of scripts that 
use much more memory than expected.

There's ways to get *some* DBAPIs to not do this (particularly psycopg2, if 
you're using Postgresql, see 
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
 and 
http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=stream_results#sqlalchemy.engine.base.Connection.execution_options),
  though the better solution is to usually try loading chunks of records in at 
a time (one such recipe that I use for this is here: 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery) .  Or 
better yet consider if the problem can be solved entirely on the SQL side (this 
entirely depends on exactly what you're trying to do with the data in question).

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to