On Oct 26, 2008, at 10:42 AM, Henk wrote:

>
> Hi,
>
> In  what way is sqlalchemy dependend on pythons cyclic gc?,
> Is there any effort to make it work even if gc is off?.
>
> I tried to turn gc off in a server side program i am developing, but
> this resulted in
> a lot of 'leaked' cycles/memory as soon as sqlalchemy is used to do
> some queries...
>
> I traced one cycle down to the relation between Session and
> SessionTransaction. They both
> hold a reference to each other, and this cycle is not broken in the
> session's close method, nor is one
> of them a weakref...
>
> Turning on gc leak detection, also showes a lot of IdentyManagedState
> objects being involved in cycles (e.g. not collected by refcount). The
> number of these objects just keeps growing with each query. Closing
> the Session has no influence on this...
>
> Depending on cyclic gc is detrimental to the performance of server
> side apps server serving many simultanious clients. The cyclic gc will
> halt the whole python process for up to seconds (if there is a large
> number of object instances in the process). All clients will thus
> experience quite some lag when the cycle gc kicks in.
>
> Unfortunatly with sqlalchemy tuning the gc to not run so often is also
> not good, because of the very large numbers of instances being
> created. A simple query for 2000 rows in my experience already creates
> about 10mb worth of sqlalchemy objects involved in cycles that need to
> be gc'd. So putting of the gc would mean memory would quickly be
> exhausted....
>
> (i am using sqlalchemy trunk revision 5200)

In general we don't worry about cyclic GC unless an issue has been  
demonstrated.  There are some areas where we are careful not to create  
cycles, in cases where weakly referenced structures need to fall out  
of scope automatically or where the timing of gc.collect() tends to be  
troublesome.    Specifically within the the area of fetched rows and  
fetched ORM objects which you mention, we are careful to not introduce  
any cyclic references within the rows themselves or on your mapped  
instances, so that zero refcount will in fact garbage collect your  
instances without a cyclic run, even though the associated  
IdentityManagedState may have cycles as you are mentioning.   The  
connection pool is also very cycle-aware.

The key issue here is if you're requesting a completely "pure" non- 
cyclical application, or if we're just talking about cycles within  
objects that are created on a large scale.   In the latter case I  
would think we're probably only talking about IdentityManagedState.   
RowProxy, its "SQL expression" analogue, doesn't have any cycles.   I  
wouldn't think SessionTransaction puts that much of a burden on cyclic  
gc since you typically have only one or maybe two of those per  
request, assuming you are using autocommit=False with a request-scoped  
session.  There's also probably a cycle between Connection and  
Transaction which is the lower level analogue of session- 
 >sessiontransaction.     There are also at least a few cycles within  
the Query object, and I don't think there are any within ClauseElement  
structures but I am not 100% sure.

In the case that you are seeking to remove all cycles from SQLAlchemy  
as a whole, that doesn't strike me as particularly practical.  There  
are many areas where traversal among internal structures in both  
directions is required, especially within Table and mapper  
structures.  To achieve this without cycles, weakrefs would have to be  
used.  Weakrefs introduce a significant performance penalty of their  
own on every access, since it turns very fast attribute access into a  
function call, which are very expensive in Python.  Its not clear to  
me that the performance saved by disabling cyclic gc would be greater  
than that introduced widespread usage of weakrefs, and at least would  
impose a significant performance penalty on the vast majority of users  
who leave Python's GC settings unchanged.     It would also be a great  
burden on future development and testing to ensure that all new code  
added in all cases does not introduce cycles, and also maintaining all  
those strong/weak cycles without unexpected reference loss .   SQLA  
also relies upon DBAPI implementations which may use cycles, and we'd  
also someday have other dependencies which might require cycles.

So I think removing cycles from row-corresponding objects like  
IdentityManagedState is worth it, only marginally so for objects like  
Query, Session, SessionTrans, Connection and Transaction, and not  
worth it at all for application-scoped things like mappers and  
tables.    For IMS we'd also need to add some tests, we have a "memory  
cleanup" testing methodology that is somewhat lacking since I've found  
that peeking into gc.get_objects() and similar looking for specific  
patterns is more difficult than it might seem.

Beyond the case of IdentityManagedState, I wouldn't be convinced that  
gc has a significant impact without seeing some benchmarks (im  
actually skeptical of the IMS case too, but removing cycles there  
shouldn't be too hard).  In Python I've often seen a great disparity  
between the theoretical and the actual wrt performance and its not  
worth going down any avenue without benchmarks and profile results to  
start.    I wonder if sites like reddit and Youtube run with gc turned  
off.   reddit.com uses Mako templates which definitely has circular  
references (and definitely requires them) within its Context object,  
so I doubt they've given attention to this - they just do what  
everyone else does and scale horizontally, something that is  
ultimately needed whether you're running a pure-C application server  
or Python with its GIL and interpreter overhead.




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to