Re: [sqlalchemy] Understanding memory usage under SQLA

Vlad K. Thu, 17 May 2012 07:41:30 -0700


Hi, thanks for your reply.

Yes, I know the Python memory management model and that while it mayfree internally, it does not to OS. I've read somewhere that it has beenfixed in 2.7 as well, but my testing on Fedora's 2.7.1 still shows theissue. That's why I thought perhaps there are dangling references in theC extension parts that are not visible to heapy (is that even possible?).

I tried with yield_per() and you once told me how that can't work if Iuse subqueryload, so I tried without subqueries. The problem is that thescript then shoots from 5 seconds to over a minute to process samedataset which is unacceptable to me for other reasons (which is expectedas there are two additional queries per each of the 5000 rows, makingthe app do 10001 queries + ORM overhead on each). However, withyield_per() the memory consumption stays as low as before the queryingbegins.

I've got three possible solutions here. One is repeated querying withlimited result set AND subqueryloading which works like yield_per,except it requires additional sorting and offset. I just tried that andit indeed consumes much less memory. With sets 500 rows at once (andwith full subqueryloads) the memory consumption is 1/10 of loading allrows at once which figures, 500 is 1/10 of 5000. This is acceptable.

Another is (materialized) views on the DB end with triggers and entirenew model to select data from.

And yet another solution is to drop ORM and construct queries manually,returning relational data as subselects in arrays, and add a thin"ORM"-like layer that just converts row columns to named tuples so thatthe consumers of this data can use same model interface. But I'mguessing this is no different than the (materialized) views approachexcept the combining is done in the DB and not in the Python app. Istill need separate model class or named tuples.





.oO V Oo.


On 05/17/2012 03:21 PM, Michael Bayer wrote:

There's a few different parts to what you're asking.
The first is that you're comparing Python's use of OS memory (I'massuming this is the 200+ MB) to Python's actual amount of objectspresent. This is a common mistake. Python up through version 2.6does not release memory back to the OS once taken - this was improvedin 2.7. There's an old article about this here:http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htmas well as Alex Martelli's answer:http://stackoverflow.com/a/1316799/34549 .
Second is, what exactly is the large object you're creating here ?Answer - first, psycopg2 by default buffers the result set fullybefore returning it to SQLAlchemy - so it is first a list of 5000tuples. Second, the ORM itself also by default buffers the full setof rows from the result set in the form of mapped objects, so 5000objects plus their related objects. A way to modify this behavioris to use the yield_per() option of Query, which will also in the caseof psycopg2 tell psycopg2 to use its "server side cursors" featurewhich does not buffer.
However, "yield_per()" is not compatible with eager loading as eagerloading involves being able to load collections across the full set oforiginal objects. Typically the better way to deal with largenumbers of rows is to paginate, using either LIMIT/OFFSET or usingwindow functions (seehttp://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery ).
Thirdly, there is a modest growth in memory when a series of mappingsare used for the first time, including the configuration of mappers,initialization of TypeEngine value processors, and such. But theinitial large resultset is the main thing causing the higher initialmemory footprint. You'll notice this isn't a "leak" at all, as itdoesn't grow.
On May 17, 2012, at 7:33 AM, Vlad K. wrote:
Hello.
I have a problem when processing relatively large number of rows. Forexample, when selecting 5000 main rows, each having a number ofmany-to-one relationships, memory usage shown by top skyrockets into200+ MB range (RES), while heapy shows cca 20MB of Python heap.PostgreSQL backend via psycopg2.
I've made a minimum example case based on the problem I'm noticing inmy Pyramid app, so the session.commit() at line 130 is there tosimulate commit done by Transaction used in Pyramid at the end ofeach request. If I'm understanding things correctly, committing wouldexpire all objects involved in the session, and I even tried manualsession.expunge(row), but there is no difference in memory usage.
The following is source of an example case. Requires SQLAlchemy(tested with 0.7.5 and 0.7.7), guppy, psycopg2 (tested with 2.4.2 and2.4.4). Happens both on Fedora 15 64-bit and CentOS 6.2 32-bit,though of course the 32-bit shows some 30% lower RES in top.
http://pastebin.com/UFgduWVw
Usage: setup a test database, update line 25 config. Prepopulatedatabase with -p flag, then run again without any flags.
I don't see where and how would any objects remain in memory, andheapy showing much lower memory use suggests something is retained inthe involved C extensions? I also tried with pympler, diff beforeand after selecting rows, shows nothing near reported by top. I guessthere is no "leak" in traditional sense of the word because repeatingthe task does not yield growing memory consumption. It stabilizes atcertain value and stays there.
Heapy before selecting rows:

Partition of a set of 102014 objects. Total size = 13160672 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  45901  45  4395296  33   4395296  33 str
     1  26041  26  2186184  17   6581480  50 tuple
     2   7039   7   900992   7   7482472  57 types.CodeType
     3   6836   7   820320   6   8302792  63 function
     4    235   0   761608   6   9064400  69 dict of module
     5    608   1   689792   5   9754192  74 dict (no owner)
     6    676   1   648544   5  10402736  79 dict of type
     7    676   1   608344   5  11011080  84 type
     8    199   0   206248   2  11217328  85 dict of class
9 185 0 167320 1 11384648 87sqlalchemy.sql.visitors.VisitableType
<334 more rows. Type e.g. '_.more' to view.>

Heapy after 5000 rows have been selected:

Partition of a set of 102587 objects. Total size = 16455168 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  45923  45  4397632  27   4397632  27 str
1 1 0 3146024 19 7543656 46sqlalchemy.orm.identity.WeakInstanceDict
     2  26090  25  2189480  13   9733136  59 tuple
     3   7039   7   900992   5  10634128  65 types.CodeType
     4   6859   7   823080   5  11457208  70 function
     5    235   0   761608   5  12218816  74 dict of module
     6    657   1   705048   4  12923864  79 dict (no owner)
     7    676   1   650464   4  13574328  82 dict of type
     8    676   1   608344   4  14182672  86 type
     9    199   0   206248   1  14388920  87 dict of class
<372 more rows. Type e.g. '_.more' to view.>
What am I doing wrong? I'm hoping something trivial and blatantlyobvious that I'm oblivious to. :)
Thanks.

--

.oO V Oo.

--
You received this message because you are subscribed to the GoogleGroups "sqlalchemy" group.To post to this group, send email to sqlalchemy@googlegroups.com<mailto:sqlalchemy@googlegroups.com>.To unsubscribe from this group, send email tosqlalchemy+unsubscr...@googlegroups.com<mailto:sqlalchemy+unsubscr...@googlegroups.com>.For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en.
--
You received this message because you are subscribed to the GoogleGroups "sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email tosqlalchemy+unsubscr...@googlegroups.com.For more options, visit this group athttp://groups.google.com/group/sqlalchemy?hl=en.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Understanding memory usage under SQLA

Reply via email to