Thanks for the info :) PD: This info should be included in the Wiki.
On Thu, Mar 4, 2010 at 2:30 PM, Michael Dürig <[email protected]> wrote: >> I am interested on these parameters to improve jackrabbit performance. I >> have an installation with more than 2 million of documents and performance >> is actually poor :( > > On the current trunk there are 3 parameters which can be used to tweak > performance for jcr2spi/spi2davex. These are the size of the item info > cache, the size of the item cache and the depth of batch read operations. > > > Some Background: > The item cache contains JCR items (i.e. nodes and properties). The item info > cache contains item infos. An item info is an entity representing nodes or > properties on the SPI layer. The jcr2spi module receives item infos from an > SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of > JCR items. > > When an item is requested from the JCR API, jcr2spi first checks whether the > item is in the item cache. If so, that item is returned. If not, the request > is passed down to the SPI. But before actually calling the SPI the item info > cache is check first. If this cache contains the requested item info the > relevant part of the JCR hierarchy is build and the corresponding JCR item > is placed into the item cache. Only when the item info cache does not > contain the requested item info a call will be made to the SPI. Here the > batch read depth comes into play. Since calls to the SPI cause some latency > (i.e. network round trips), the SPI may - in addition to the actually > requested item info - return additional item infos. The batch read depth > parameter specifies the depth down to which item infos of the children of > the requested item info are returned. > > Overall the size of the item info cache and the batch read depth should be > used to optimize for the requirements of the back-end (i.e. network and > server). In general, the item info cache should be large enough to *easily* > hold all items from multiple batches. The batch read depth should be a trade > off between network latency and item info cache overhead. Finally the item > cache should be used to optimize for the requirements of the front-end (i.e. > the JCR API client). It should be able to hold the items in the current > working set of the API consumer. > > Some pointers: > > Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos() > org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG > > Item info cache size: > org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE > > Item cache size: > org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE > > Related JIRA issues: > JCR-2497: Improve jcr2spi read performance > JCR-2498: Implement caching mechanism for ItemInfo batches > JCR-2461: Item retrieval inefficient after refresh > JCR-2499: Add simple benchmarking tools for jcr2spi read perform > > Michael > > On 2/28/10 9:21 PM, Paco Avila wrote: >> >> El 28/02/2010 15:50, "Michael Dürig"<[email protected]> escribió: >> >> François, >> >> I spent some time on improving performance lately. See >> https://issues.apache.org/jira/browse/JCR-2497 and related issues. >> >> I was able to improve performance for our use case with these fixes. >> Getting >> the parameters right (i.e. item cache size, item info cache size and batch >> read depth) is still quite tricky though and requires careful profiling. >> >> I can provide more specific information on these parameters if required. >> >> Michael >> >> >> >> >> >> >> François Cassistat wrote: >>> >>> Ok, I've studied a little what was going on with a packet analyze... >> > -- OpenKM http://www.openkm.com http://www.guia-ubuntu.org
