On Oct 26, 2008, at 10:42 AM, Henk wrote:
> > Hi, > > In what way is sqlalchemy dependend on pythons cyclic gc?, > Is there any effort to make it work even if gc is off?. > > I tried to turn gc off in a server side program i am developing, but > this resulted in > a lot of 'leaked' cycles/memory as soon as sqlalchemy is used to do > some queries... > > I traced one cycle down to the relation between Session and > SessionTransaction. They both > hold a reference to each other, and this cycle is not broken in the > session's close method, nor is one > of them a weakref... > > Turning on gc leak detection, also showes a lot of IdentyManagedState > objects being involved in cycles (e.g. not collected by refcount). The > number of these objects just keeps growing with each query. Closing > the Session has no influence on this... > > Depending on cyclic gc is detrimental to the performance of server > side apps server serving many simultanious clients. The cyclic gc will > halt the whole python process for up to seconds (if there is a large > number of object instances in the process). All clients will thus > experience quite some lag when the cycle gc kicks in. > > Unfortunatly with sqlalchemy tuning the gc to not run so often is also > not good, because of the very large numbers of instances being > created. A simple query for 2000 rows in my experience already creates > about 10mb worth of sqlalchemy objects involved in cycles that need to > be gc'd. So putting of the gc would mean memory would quickly be > exhausted.... > > (i am using sqlalchemy trunk revision 5200) In general we don't worry about cyclic GC unless an issue has been demonstrated. There are some areas where we are careful not to create cycles, in cases where weakly referenced structures need to fall out of scope automatically or where the timing of gc.collect() tends to be troublesome. Specifically within the the area of fetched rows and fetched ORM objects which you mention, we are careful to not introduce any cyclic references within the rows themselves or on your mapped instances, so that zero refcount will in fact garbage collect your instances without a cyclic run, even though the associated IdentityManagedState may have cycles as you are mentioning. The connection pool is also very cycle-aware. The key issue here is if you're requesting a completely "pure" non- cyclical application, or if we're just talking about cycles within objects that are created on a large scale. In the latter case I would think we're probably only talking about IdentityManagedState. RowProxy, its "SQL expression" analogue, doesn't have any cycles. I wouldn't think SessionTransaction puts that much of a burden on cyclic gc since you typically have only one or maybe two of those per request, assuming you are using autocommit=False with a request-scoped session. There's also probably a cycle between Connection and Transaction which is the lower level analogue of session- >sessiontransaction. There are also at least a few cycles within the Query object, and I don't think there are any within ClauseElement structures but I am not 100% sure. In the case that you are seeking to remove all cycles from SQLAlchemy as a whole, that doesn't strike me as particularly practical. There are many areas where traversal among internal structures in both directions is required, especially within Table and mapper structures. To achieve this without cycles, weakrefs would have to be used. Weakrefs introduce a significant performance penalty of their own on every access, since it turns very fast attribute access into a function call, which are very expensive in Python. Its not clear to me that the performance saved by disabling cyclic gc would be greater than that introduced widespread usage of weakrefs, and at least would impose a significant performance penalty on the vast majority of users who leave Python's GC settings unchanged. It would also be a great burden on future development and testing to ensure that all new code added in all cases does not introduce cycles, and also maintaining all those strong/weak cycles without unexpected reference loss . SQLA also relies upon DBAPI implementations which may use cycles, and we'd also someday have other dependencies which might require cycles. So I think removing cycles from row-corresponding objects like IdentityManagedState is worth it, only marginally so for objects like Query, Session, SessionTrans, Connection and Transaction, and not worth it at all for application-scoped things like mappers and tables. For IMS we'd also need to add some tests, we have a "memory cleanup" testing methodology that is somewhat lacking since I've found that peeking into gc.get_objects() and similar looking for specific patterns is more difficult than it might seem. Beyond the case of IdentityManagedState, I wouldn't be convinced that gc has a significant impact without seeing some benchmarks (im actually skeptical of the IMS case too, but removing cycles there shouldn't be too hard). In Python I've often seen a great disparity between the theoretical and the actual wrt performance and its not worth going down any avenue without benchmarks and profile results to start. I wonder if sites like reddit and Youtube run with gc turned off. reddit.com uses Mako templates which definitely has circular references (and definitely requires them) within its Context object, so I doubt they've given attention to this - they just do what everyone else does and scale horizontally, something that is ultimately needed whether you're running a pure-C application server or Python with its GIL and interpreter overhead. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---