Hanno Schlichting <hanno <at> hannosch.eu> writes: > You are using queryplan in the site, right? The most typical catalog > query for Plone consists of something like ('allowedRolesAndUsers', > 'effectiveRange', 'path', 'sort_on'). Without queryplan you indeed > load the entire tree (or trees inside allowedRolesAndUsers) for each > of these indexes.
Yes we are using queryplan. Without it the site becomes pretty much unusable. > With queryplan it knows from prior execution, that the set returned by > the path index is the smallest. So it first calculates this. Then it > uses this small set (usually 10-100 items per folder) to look inside > the other indexes. It then only needs to do an intersection of the > small path set with each of the trees. If the path set has less then > 1000 items, it won't even use the normal intersection function from > the BTrees module, but use the optimized Cython based version from > queryplan, which essentially does a for-in loop over the path set. > Depending on the size ratio between the sets this is up to 20 times > faster with in-memory data, and even more so if it avoids database > loads. In the worst case you would load buckets equal to length of the > path set, usually you should load a lot less. There still seem to be instances in which the entire set is loaded. This could be an artifact of the fact I am clearing the ZODB cache before each ]test, which I think seems to be clearing the query plan. Speaking of which I saw in the query plan code, some hook to load a pre-defined query plan... but I can't see exactly how you supply this plan or in what format it is. Do you use this feature? > We have large Plone sites in the same range of multiple 100.000 items > and with queryplan and blobs we can run them with ZODB cache sizes of > less than 100.000 items and memory usage of 500mb per single-threaded > process. > > Of course it would still be really good to optimize the underlying > data structures, but queryplan should help make this less urgent. Well, I think we are already at that point ;) There are also I think other times in which the full set is loaded. > > Ahh interesting, that is good to know. I've not actually checked the > > conflict resolution code, but do bucket change conflicts actually get > > resolved in some sane way, or does the transaction have to be > > retried? > > Conflicts inside the same bucket can be resolved and you won't get to > see any log message for them. If you get a ConflictError in the logs, > it's one where the request is being retried. Great. That was that I always thought, but just wanted to check. So in that case, what does it mean if I see a conflict error for an IISet? Can they not resolve conflicts internally? > >> And imagine if you use zc.zlibstorage to compress records! :) > > > > This is Plone 3, which is Zope 2.10.11, does zc.zlibstorage work on > > that, or does it need newer ZODB? > > zc.zlibstorage needs a newer ZODB version. 3.10 and up to be exact. > > > Also, unless I can sort out that > > large number of small pickles being loaded, I'd imagine this would > > actually slow things down. > > The Data.fs would be smaller, making it more likely to fit into the OS > disk cache. The overhead of uncompressing the data is small compared > to the cost of a disk read instead of a memory read. But it's hard to > say what exactly happens with the cache ratio in practice. Yeah, if we could use it I certainly would :) I guess what I mean above is that larger pickles would compress better, so lots of small pickles the compression would be less effective. -Matt _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev