Re: [sqlalchemy] Re: Working with large IN lists

Claudio Freire Wed, 22 Feb 2012 12:28:27 -0800

On Wed, Feb 22, 2012 at 5:18 PM, Michael Bayer <mike...@zzzcomputing.com> wrote:
>
> On Feb 22, 2012, at 2:46 PM, Claudio Freire wrote:
>
>> On Wed, Feb 22, 2012 at 4:29 PM, Michael Bayer <mike...@zzzcomputing.com> 
>> wrote:
>>>> thanks for your reply. I haven't yet tested this with a profiler to see 
>>>> exactly what exactly is happening, but the bottom line is that the overall 
>>>> memory use grows with each iteration (or transaction processed), to the 
>>>> point of grinding the server to a halt, and top shows only the Python 
>>>> process involved consuming all the memory.
>>>
>>> yeah like I said that tells you almost nothing until you start looking at 
>>> gc.get_objects().  If the size of gc.get_objects() grows continuously for 
>>> 50 iterations or more, never decreasing even when gc.collect() is called, 
>>> then it's a leak.  Otherwise it's just too much data being loaded at once.
>>
>> I've noticed compiling queries (either explicitly or implicitly) tends
>> to *fragment* memory. There seem to be long-lived caches in the PG
>> compiler at least. I can't remember exactly where, but I could take
>> another look.
>>
>> I'm talking of rather old versions of SQLA, 0.3 and 0.5.
>
>
> 0.3's code is entirely gone, years ago.  I wouldn't even know what silly 
> things it was doing.


Like I said, I would have to take another look into the matter to
validate against 0.7

> In 0.5 and beyond, theres a cache of identifiers for quoting purposes.   If 
> you are creating perhaps thousands of tables with hundreds of columns, all 
> names being unique, then this cache might start to become  a blip on the 
> radar.   For the expected use case of a schema with at most several hundred 
> tables this should not be a significant size.

Fixed schema, but the code did create lots of aliases for dynamic queries.

> I don't know much what it means for a Python script to "fragment" memory, and 
> I don't really think there's some kind of set of Python programming practices 
> that deterministically link to whether or not a script fragments a lot.  Alex 
> Martelli talks about it here: 
> http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python
>  .    The suggestion there is if you truly need to load tons of data into 
> memory, doing it in a subprocess is the only way to guarantee that memory is 
> freed back to the OS.

I wrote a bit about that[0].

The issue is with long-lived objects, big or small, many or few.

> As it stands, there are no known memory leaks in SQLAlchemy itself

I can attest to that. I have a backend running 24/7 and, barring some
external force, it can keep up for months with no noticeable leaks.

> and if you look at our tests under aaa_profiling/test_memusage.py you can see 
> we are exhaustively ensuring that the size of gc.get_objects() does not grow 
> unbounded for all sorts of awkward situations.    To illustrate potential new 
> memory leaks we need succinct test cases that illustrate a simple ascending 
> growth in memory usage.

Like I said, it's not a leak situation as much of a fragmentation
situation, where long-lived objects in high memory positions can
prevent the process' heap from shrinking.

[0] http://revista.python.org.ar/2/en/html/memory-fragmentation.html

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to