Francois Deppierraz wrote: > Hi, > > I have been recently experiencing memory leaks in Tahoe-LAFS during > operations such as 'tahoe deep-check --add-lease --repair' or while > using the FTP/SFTP frontend¹. > > While I was researching different ways to debug memory leaks in Python, > I learned about a tool named Heapy which provides a heap analysis > toolset which is part of the Guppy² Programming Environment.
This is very cool and useful, thanks. > It looks like cache entries which are not correctly released. That is consistent with some experiments I did using the objgraph module (http://mg.pov.lt/objgraph/) and gc.get_objects(). I was unsure of those results at the time because the graph didn't give enough information to show why the cache entries were being retained. Brian wrote: > My suspicion is that the ResponseCache object is living a lot longer > than expected, and it's accumulating cached responses from lots and lots > of generations ("versions") of the mutable file that contains a > directory which is being modified heavily. When you say your test > "deleted about 1000 very small files from a single directory", you're > really making 1000 sequential changes to the same directory, right? So > there will be 1000 mutable file writes to the same file? If my suspicion > is right, there will be 1000 different 'verinfos' values (and N times as > many keys in self.cache, each of which may have multiple strings in the > set, resulting in a large number of strings left around). > > When I first wrote ResponseCache back during the original Big Mutable > File Rewrite (in april-2008), I expected that instances would only stick > around for the duration of a mutable-file operation and then be gc'ed, > so I didn't worry about ever removing old versions from its cache. The SFTP frontend is intended to keep around a reference to a dirnode object only as long as there exists at least one open handle to that directory on some SFTP connection. However the SFTP client can keep each such handle open for as long as it wants, and I think that sshfs caches handles. If the lifetimes of a sequence of SFTP handles pointing to the same directory overlap, I believe they will all end up sharing the same dirnode object. This is not obviously the wrong thing to do; perhaps the mutable ResponseCache should expire entries more agressively in order to tolerate that usage. -- David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
signature.asc
Description: OpenPGP digital signature
_______________________________________________ tahoe-dev mailing list tahoe-dev@tahoe-lafs.org http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev