On Jun 26, 2011, at 4:31 AM, Warwick Prince wrote:

> Hi Michael
> 
> I'm having an issue with memory usage that I would appreciate some insight..
> 
> I have a fairly straight forward process, that works perfectly as far as it 
> delivering the desired updates in the DB etc, however, it accumulates memory 
> usage (just like a leak) and I can not find a way to release it.
> 
> I have a multi threaded process.  Each thread creates a scopedsession from 
> the standard global Session = scoped_session(sessionmaker()) construct.  Each 
> thread does some work using mapped objects to update some and add some rows 
> into the (MySQL on Windows) DB.  All this works perfectly and as expected.   
> Due to various reasons, I flush/commit after each row is updated/inserted.   
> After the "batch" of updates is complete, I come back and session.remove() 
> (In an attempt to dump the session) and then wait for a while and do the 
> entire thing again.  At the start of each run, I create a new 
> session=Session() and do the updates and return and session.remove().
> 
> To me, I would assume that the memory would be the session's cache of objects 
> that are being managed - which I can understand.  What I can't understand is 
> why when I delete *everything* e.g. del engine, del meta, del session and 
> even stop the thread, the memory is still consumed.    I must stop the entire 
> process before the memory is returned to the system.   After around 10 hours 
> of running, I've used 2Gb+ of memory and everything crashes.

By "the memory is still consumed", if you're talking about the memory of your 
process, that's Python's behavior - once the size of memory usage grows to X, 
it stays at X no matter what you dereference within the process.  So the key is 
to manage how large a collection ever gets filled up in the first place.    The 
only true measure of "python objects being leaked" is the size of 
gc.get_objects().  If that size is managed, that's as far as Python code can go 
towards managing memory.

So I'm assuming you just mean the size of the process.     If you're dealing 
with large numbers of rows being loaded into memory, you'd need to cut down on 
the maximum size of objects loaded at once.

The Session does not strongly reference anything, except for that which is 
present in the ".new" and ".dirty" collections.   If those are empty, it is not 
strongly referencing anything, and as long as gc is enabled, the number of 
objects in memory will be managed.    Older versions of Session in 0.5, 0.4 and 
such were not as good at this, but in 0.6, 0.7 it's quite solid, there is a 
whole suite of unit tests that ensure SQLAlchemy components like Engine, 
Session, schema, etc. do not leak memory under a variety of setup/teardown 
situations.      But it seems like you're don't yet know if you're experiencing 
a problem at the Python object level.



> i.e. It loops over the other database (non SQL) reading all the data that I 
> WOULD use to update the SQL database using SA.  When SA is not involved, 
> nothing is happening with the memory.

note that DBAPIs, particularly older versions of MySQLdb, may have memory 
leaks, and most DBAPIs when asked to fetch a result will load the full set of 
results into memory before fetchone() is ever called, thus causing a great 
unconditional increase in the size of memory if you are fetching very large 
result sets.

> 
> Any hint on how I can a) see what is being held and b) dump it!

gc.get_objects()


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to