[Trac-dev] Re: t.e.o upgraded to 0.11 beta1

Christian Boos Fri, 08 Feb 2008 02:54:42 -0800

Christopher Lenz wrote:
> On 04.02.2008, at 20:06, Christian Boos wrote:
>   
>> By the way, I'm not sure I found THE leak, but at least an important
>> one. I'm still trying to get my way through the generator chain in
>> Genshi to understand more precisely what the problem is, but things go
>> really bad memory wise once an exception is raised from an expression
>> evaluation (and it doesn't seem to be related to what we do with that
>> exception later at the Trac level). I'll update #6614 when I have new
>> findings.
>>     
>
> Great work closing in on the problem, Christian! I tried to reproduce  
> this with a somewhat smaller example, but failed so far (also,  
> unfortunately have very little time right now).
>


Unfortunately same goes for me (plus the fact that at times I needed to 
take a break from those daunting debugging sessions and work on features 
I needed, like the wiki rename and multirepos support).

Also, I still haven't tried to reproduce the issue on a smaller example, 
but of course the problem might relate to the Trac/Genshi interaction 
rather than to Genshi alone.

> So what can this be aside from stack frame data hanging around in a  
> bad way? The various generators don't have much local data, and global  
> data shouldn't be used at all. I'm lost.
>   

 From my very latest findings, it appears that there are two issues, 
probably rooted in the same cause.

1. At the very least the `data` dictionary of every request gets 
accumulated over time.

This can be hard to spot if this data corresponds to "cheap" pages, but 
can amount to hundreds of Mb per request if that data keeps alive costly 
stuff. Such expensive queries correspond for example to timeline 
requests. An example is to look 3 years back in the timeline of an 
environment setup to browse on a copy of the t.e.o repository, using the 
direct-svnfs access type (i.e. no cache involved). No need to list 
_lots_ of changesets, only the fact that we have to go through all the 
changesets for getting there, accumulates data in pools that are kept 
alive because the changesets themselves are kept alive (that in itself 
can be considered as another bug at the svn_fs level).
But even if we fix the SVN issue or don't get there in the first place, 
there's a kind of leak going on, for every request. That use case just 
makes it *very* apparent.

2. If there's an exception raised from a code block (Expression or 
Suite), it's even worse. The *whole traceback* gets accumulated,  i.e. 
all frames, so all objects referenced by local variables in any of the 
frames up to the evaluate one, will be kept alive for the rest of the 
lifespan of the process. In this case of course, the memory usage 
increases even more dramatically than in case 1.

To be clear, when I say "accumulated" in the above, it's really that 
those objects are considered live by the GC, not unreachable. I do a 
gc.collect() with the SAVE_ALL flag set, after every request, and 
collect() says we have no unreachable objects. This means that 
apparently we're doing the right thing for our __del__ stuff.


<some-more-gory-details>
Now the weird thing is that when I debug the case (1) using a call to 
gc.get_referrers() on the 'data' object, there are 3 referrers to it: 
the local frame (fine), the list of live objects as obtained with 
gc.get_objects() (also fine), plus a very curious one.

The last referrer is most of the time a dict object which looks like the 
session, but is not the session - I've modified the Session so that it's 
no longer a dict itself but an object delegating its dict like methods 
to a self._dict dictionary. Even that new _dict field shows up as a key 
in this dict-like session, so I'm sure the new code is picked (btw, I 
did this change because of http://bugs.python.org/issue1469629). But it 
happened a few time that this third object was an unrelated <Request 
"..../trac.css"> object, and sometimes another list with a huge pile of 
crap inside (not the live objects list). Weird. Not to mention that this 
dict doesn't appear to even contain an entry for 'data'...

Ok, those were the very latest findings of yesterday evening (quite 
late!) and I don't know how much I can trust get_referrers(), so it 
might well be (just another) wrong track...
</some-more-gory-details>

> If you can reliably reproduce this problem with a local data set (do  
> you have some instructions maybe?), could you maybe try whether it  
> also happens with Genshi 0.4.4? I'd like to know whether this problem  
> is new in 0.5dev.
>   

Great advice, I'll try with 0.4.4.
Also, I'll post some of the patches I used for debugging case (1) and 
trigger case (2)  on #6614 later today.

-- Christian

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Trac-dev] Re: t.e.o upgraded to 0.11 beta1

Reply via email to