On Sun, 2008-10-26 at 14:07 -0400, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Roché Compaan wrote: > > On Sat, 2008-10-25 at 09:20 +0200, Hedley Roos wrote: > >>> Have you measures the time needs for some "standard" ZCatalog queries > >>> used with a Plone site with the communication overhead with memcached? > >>> Generally spoken: I think the ZCatalog is in general fast. Queries using a > >>> fulltext index are known to be more expensive or if you have to deal with > >>> large resultsets or complex queries. > >>> > >> No I haven't. Roche Compaan has done extensive benchmarking using > >> funkload testing plain catalog vs module level cache vs memcached, but > >> the tests are more about page serving than catalog query time. I'll > >> ask him to comment more on that. > > > > I actually did some profiling as well and catalog searches were just too > > damn slow. The average execution time for searchResults was 100 > > milliseconds and this is why I told Hedley we should do some caching at > > query level in the first place. I experimented with this idea a couple > > of years back but wasn't successful due to inexperience. I was trying to > > cache brains which obviously leads to persistency bugs. This time around > > it was obvious to me that we should cache the IISet result sets. > > > > I suspect specific indexes are just performing suboptimally and needs to > > be improved. ExtendPathIndex in Plone seems to be one of them. > > > > The effect on performance is really awesome, now we just need to fine > > tune the implementation. > > Before (or while) we work on caching, can we try to improve the > underlying indexes, and the way that applications use them? I'm pretty > sure that there is a lot of room for improvement: > > - Plone uses too many indexes, and in particular, uses multiple text > indexes. Having extra indexes around "just in case" is a sure lose > a write time, and may even be expensive at query time (depending on > the query). > > - Particular indexes have performance characteristics based on their > designed purpose: for instance, the stock FieldIndex implementation > assumes that the number of documents indexed will be >> the number of > discrete indexable values. Using such an index in an application > domain with a very large set of indexable values probably loses, and > in ways which don't show up in early / small-scale testing. > > - I'm pretty sure that we haven't yet found the best data structure for > "hierarchy indexes" (e.g., the Plone EPI index, or the stock Zope2 > PathIndex, etc.). Something like a 'trie' might be optimal for > pure prefix searching of hierarchies. > > - I am confident that the TopicIndex is underutiliized: it does *all* > the work for a given query at write time, and can thus be blindingly > fast at query time. > > - Other special-purpose indexes (e.g., a "recent items" index) would > be worth a look, especially for applications with large volumes of > content.
I agree that one should look at improving performance without caching as well. But this is a lot harder and takes significantly more development and debugging time than introducing some form caching. So I'm not convinced that it needs to happen in a certain order. If caching gives you lots of performance with little effort now, then why shouldn't you use it? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za _______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )