There's a few different parts to what you're asking.

The first is that you're comparing Python's use of OS memory (I'm assuming this 
is the 200+ MB) to Python's actual amount of objects present.   This is a 
common mistake.  Python up through version 2.6 does not release memory back to 
the OS once taken - this was improved in 2.7.   There's an old article about 
this here:  
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
 as well as Alex Martelli's answer: http://stackoverflow.com/a/1316799/34549 .

Second is, what exactly is the large object you're creating here ?   Answer - 
first, psycopg2 by default buffers the result set fully before returning it to 
SQLAlchemy - so it is first a list of 5000 tuples.  Second, the ORM itself also 
by default buffers the full set of rows from the result set in the form of 
mapped objects, so 5000 objects plus their related objects.    A way to modify 
this behavior is to use the yield_per() option of Query, which will also in the 
case of psycopg2 tell psycopg2 to use its "server side cursors" feature which 
does not buffer.

However, "yield_per()" is not compatible with eager loading as eager loading 
involves being able to load collections across the full set of original 
objects.   Typically the better way to deal with large numbers of rows is to 
paginate, using either LIMIT/OFFSET or using window functions (see 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery ).

Thirdly, there is a modest growth in memory when a series of mappings are used 
for the first time, including the configuration of mappers, initialization of 
TypeEngine value processors, and such.   But the initial large resultset is the 
main thing causing the higher initial memory footprint.    You'll notice this 
isn't a "leak" at all, as it doesn't grow.


On May 17, 2012, at 7:33 AM, Vlad K. wrote:

> 
> Hello.
> 
> I have a problem when processing relatively large number of rows. For 
> example, when selecting 5000 main rows, each having a number of many-to-one 
> relationships, memory usage shown by top skyrockets into 200+ MB range (RES), 
> while heapy shows cca 20MB of Python heap. PostgreSQL backend via psycopg2.
> 
> I've made a minimum example case based on the problem I'm noticing in my 
> Pyramid app, so the session.commit() at line 130 is there to simulate commit 
> done by Transaction used in Pyramid at the end of each request. If I'm 
> understanding things correctly, committing would expire all objects involved 
> in the session, and I even tried manual session.expunge(row), but there is no 
> difference in memory usage.
> 
> The following is source of an example case. Requires SQLAlchemy (tested with 
> 0.7.5 and 0.7.7), guppy, psycopg2 (tested with 2.4.2 and 2.4.4). Happens both 
> on Fedora 15 64-bit and CentOS 6.2 32-bit, though of course the 32-bit shows 
> some 30% lower RES in top.
> 
> http://pastebin.com/UFgduWVw
> 
> 
> Usage: setup a test database, update line 25 config. Prepopulate database 
> with -p flag, then run again without any flags.
> 
> I don't see where and how would any objects remain in memory, and heapy 
> showing much lower memory use suggests something is retained in the involved 
> C extensions?  I also tried with pympler, diff before and after selecting 
> rows, shows nothing near reported by top. I guess there is no "leak" in 
> traditional sense of the word because repeating the task does not yield 
> growing memory consumption. It stabilizes at certain value and stays there.
> 
> Heapy before selecting rows:
> 
> Partition of a set of 102014 objects. Total size = 13160672 bytes.
>  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
>      0  45901  45  4395296  33   4395296  33 str
>      1  26041  26  2186184  17   6581480  50 tuple
>      2   7039   7   900992   7   7482472  57 types.CodeType
>      3   6836   7   820320   6   8302792  63 function
>      4    235   0   761608   6   9064400  69 dict of module
>      5    608   1   689792   5   9754192  74 dict (no owner)
>      6    676   1   648544   5  10402736  79 dict of type
>      7    676   1   608344   5  11011080  84 type
>      8    199   0   206248   2  11217328  85 dict of class
>      9    185   0   167320   1  11384648  87 
> sqlalchemy.sql.visitors.VisitableType
> <334 more rows. Type e.g. '_.more' to view.>
> 
> Heapy after 5000 rows have been selected:
> 
> Partition of a set of 102587 objects. Total size = 16455168 bytes.
>  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
>      0  45923  45  4397632  27   4397632  27 str
>      1      1   0  3146024  19   7543656  46 
> sqlalchemy.orm.identity.WeakInstanceDict
>      2  26090  25  2189480  13   9733136  59 tuple
>      3   7039   7   900992   5  10634128  65 types.CodeType
>      4   6859   7   823080   5  11457208  70 function
>      5    235   0   761608   5  12218816  74 dict of module
>      6    657   1   705048   4  12923864  79 dict (no owner)
>      7    676   1   650464   4  13574328  82 dict of type
>      8    676   1   608344   4  14182672  86 type
>      9    199   0   206248   1  14388920  87 dict of class
> <372 more rows. Type e.g. '_.more' to view.>
> 
> 
> 
> 
> What am I doing wrong? I'm hoping something trivial and blatantly obvious 
> that I'm oblivious to. :)
> 
> 
> Thanks.
> 
> -- 
> 
> .oO V Oo.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to