Re: [sqlalchemy] Re: Working with large IN lists

Michael Bayer Wed, 22 Feb 2012 12:18:48 -0800

On Feb 22, 2012, at 2:46 PM, Claudio Freire wrote:

> On Wed, Feb 22, 2012 at 4:29 PM, Michael Bayer <mike...@zzzcomputing.com> 
> wrote:
>>> thanks for your reply. I haven't yet tested this with a profiler to see 
>>> exactly what exactly is happening, but the bottom line is that the overall 
>>> memory use grows with each iteration (or transaction processed), to the 
>>> point of grinding the server to a halt, and top shows only the Python 
>>> process involved consuming all the memory.
>> 
>> yeah like I said that tells you almost nothing until you start looking at 
>> gc.get_objects().  If the size of gc.get_objects() grows continuously for 50 
>> iterations or more, never decreasing even when gc.collect() is called, then 
>> it's a leak.  Otherwise it's just too much data being loaded at once.
> 
> I've noticed compiling queries (either explicitly or implicitly) tends
> to *fragment* memory. There seem to be long-lived caches in the PG
> compiler at least. I can't remember exactly where, but I could take
> another look.
> 
> I'm talking of rather old versions of SQLA, 0.3 and 0.5.



0.3's code is entirely gone, years ago.  I wouldn't even know what silly things 
it was doing.

In 0.5 and beyond, theres a cache of identifiers for quoting purposes.   If you 
are creating perhaps thousands of tables with hundreds of columns, all names 
being unique, then this cache might start to become  a blip on the radar.   For 
the expected use case of a schema with at most several hundred tables this 
should not be a significant size.

I don't know much what it means for a Python script to "fragment" memory, and I 
don't really think there's some kind of set of Python programming practices 
that deterministically link to whether or not a script fragments a lot.  Alex 
Martelli talks about it here: 
http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python
 .    The suggestion there is if you truly need to load tons of data into 
memory, doing it in a subprocess is the only way to guarantee that memory is 
freed back to the OS.    

As it stands, there are no known memory leaks in SQLAlchemy itself and if you 
look at our tests under aaa_profiling/test_memusage.py you can see we are 
exhaustively ensuring that the size of gc.get_objects() does not grow unbounded 
for all sorts of awkward situations.    To illustrate potential new memory 
leaks we need succinct test cases that illustrate a simple ascending growth in 
memory usage.




-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to