Jim Fulton <jim <at> zope.com> writes: > > On Wed, Jan 26, 2011 at 3:15 PM, Matt Hamilton <matth <at> > netsight.co.uk> wrote:
> > So, with up to 300,000 items in some of these IISets, it means to > > iterate over the entire set (during a Catalog query) means loading > > 5,000 objects over ZEO from the ZODB, which adds up to quite a bit > > of > > latency. With quite a number of these data structures about, means > > we > > can end up with in the order of 50,000 object in the ZODB cache > > *just* > > for these IISets! > > Hopefully, you're not iterating over the entire tree, but still. :) Alas we are. Or rather, alas, ZCatalog does ;) It would be great if it didn't but it's just the way it is. If I have 300,000 items in my site, and everyone of them visible to someone with the 'Reader' role, then the allowedRolesAndUsers index will have an IITreeSet with 300,000 elements in it. Yes, we could try and optimize out that specific case, but there are others like that too. If all of my items have no effective or expires date, then the same happens with the effective range index (DateRangeIndex 'always' set). > > So... has anyone tried increasing the size of MAX_BUCKET_SIZE in real > > life? > > We have, mainly to reduce the number of conflicts. > > > I understand that this will increase the potential for conflicts > > if the bucket/set size is larger (however in reality this probably > > can't get worse than it is, as currently as the value inserted is 99% > > of the time greater than the current max value stored -- it is a > > timestamp -- you always hit the last bucket/set in the tree). > > Actually, it reduces the number of unresolveable conflicts. > Most conflicting bucket changes can be resolved, but bucket > splits can't be and bigger buckets means fewer splits. > > The main tradeoff is record size. Ahh interesting, that is good to know. I've not actually checked the conflict resolution code, but do bucket change conflicts actually get resolved in some sane way, or does the transaction have to be retried? Actually... that is a good point, and something I never thought of... when you get a Conflict Error in the logs (that was resolved) does that mean that _p_resolveConflict was called and successful, or does it mean that the transactions were retried and that resolved the conflict? > > I was going to experiment with increasing the MAX_BUCKET_SIZE on an IISet > > from 120 to 1200. Doing a quick test, a pickle of an IISet of 60 items > > is around 336 bytes, an of 600 items is 1580 bytes... so still very > > much in the realms of a single disk read / network packet. > > And imagine if you use zc.zlibstorage to compress records! :) This is Plone 3, which is Zope 2.10.11, does zc.zlibstorage work on that, or does it need newer ZODB? Also, unless I can sort out that large number of small pickles being loaded, I'd imagine this would actually slow things down. > > I'm not sure how the current MAX_BUCKET_SIZE values were determined, > > but looks like they have been the same since the dawn of time, and I'm > > guessing might be due a tune? > > Probably. > > > It looks like I can change that constant and recompile the BTree > > package, and it will work fine with existing IISets and just take > > effect on new sets created (ie clear and rebuild the catalog index). > > > > Anyone played with this before or see any major flaws to my cunning plan? > > We have. My long term goal is to arrange things so that you can > specify/change limits by sub-classing the BTree classes. > Unfortunately, that's been a long-term priority for too long. > This could be a great narrow project for someone who's willing > to grok the Python C APIs. I remember you introduced me to the C API for things like this waaaay back in Reading at the first non US Zope 3 sprint... I was trying to create compressed list data structures for catalogs.... I never could quite get rid of the memory leaks I was getting! ;) Maybe I'll be brave and take another look. > Changing the default sizes for the II ad LL BTrees is pretty straightforward. > We were more interested in LO (and similar) BTrees. For those, > it's much harder to guess sizes because you don't know generally > how big the objects will be, which is why I'd like to make it tunable at the > application level. Yeah, I guess that is the issue. I wonder if it would be easy for the code to work out the total size of the bucket in bytes and then split based upon that. Or something like 120 items, or 500kB, whichever comes first. Just looking at the cache on the site at the moment, and we have a total of 978,355 objects in cache, of which: 312,523 IOBucket 274,025 IISet 116,136 OOBucket 114,626 IIBucket So 83% of my cache is just those four object types. -Matt _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev