All, I have been doing some performance investigation into a large Plone site we have running. The site in question has approx 300,000 items of content. Each piece of content is indexed by ZCatalog.
The main thing I was tracking down was the very large number of objects being loaded by the ZODB, mostly IISet instances. The large numebr of instances seems to be caused by a particular usage pattern, in various indexes in the Catalog there are a number of IITreeSet instances that are used to map, for instance, time -> UID. As content items are added, you end up adding monotonically increasing values to a set. The result of this is that you end up 'leaving behind' loads of buckets (or IISets in the case of an IITreeSet) that are half full. Looking at the BTrees code, I see there is a MAX_BUCKET_SIZE constant that is set for the various BTree/Set types, and in the case of an IISet it is set to 120. This means, when inserting into a IITreeSet, when the IISet gets beyond 120 items it is split and a new IISet created. Hence as above I see a lage number of 60 item IISets due to the pattern in which these data structures are filled. So, with up to 300,000 items in some of these IISets, it means to iterate over the entire set (during a Catalog query) means loading 5,000 objects over ZEO from the ZODB, which adds up to quite a bit of latency. With quite a number of these data structures about, means we can end up with in the order of 50,000 object in the ZODB cache *just* for these IISets! So... has anyone tried increasing the size of MAX_BUCKET_SIZE in real life? I understand that this will increase the potential for conflicts if the bucket/set size is larger (however in reality this probably can't get worse than it is, as currently as the value inserted is 99% of the time greater than the current max value stored -- it is a timestamp -- you always hit the last bucket/set in the tree). I was going to experiment with increasing the MAX_BUCKET_SIZE on an IISet from 120 to 1200. Doing a quick test, a pickle of an IISet of 60 items is around 336 bytes, an of 600 items is 1580 bytes... so still very much in the realms of a single disk read / network packet. I'm not sure how the current MAX_BUCKET_SIZE values were determined, but looks like they have been the same since the dawn of time, and I'm guessing might be due a tune? It looks like I can change that constant and recompile the BTree package, and it will work fine with existing IISets and just take effect on new sets created (ie clear and rebuild the catalog index). Anyone played with this before or see any major flaws to my cunning plan? -Matt _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev