Re: jcr2spi NodeIterator.getNode() performances

Paco Avila Thu, 04 Mar 2010 07:55:41 -0800

Thanks for the info :)

PD: This info should be included in the Wiki.



On Thu, Mar 4, 2010 at 2:30 PM, Michael Dürig <[email protected]> wrote:
>> I am interested on these parameters to improve jackrabbit performance. I
>> have an installation with more than 2 million of documents and performance
>> is actually poor :(
>
> On the current trunk there are 3 parameters which can be used to tweak
> performance for jcr2spi/spi2davex. These are the size of the item info
> cache, the size of the item cache and the depth of batch read operations.
>
>
> Some Background:
> The item cache contains JCR items (i.e. nodes and properties). The item info
> cache contains item infos. An item info is an entity representing nodes or
> properties on the SPI layer. The jcr2spi module receives item infos from an
> SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of
> JCR items.
>
> When an item is requested from the JCR API, jcr2spi first checks whether the
> item is in the item cache. If so, that item is returned. If not, the request
> is passed down to the SPI. But before actually calling the SPI the item info
> cache is check first. If this cache contains the requested item info the
> relevant part of the JCR hierarchy is build and the corresponding JCR item
> is placed into the item cache. Only when the item info cache does not
> contain the requested item info a call will be made to the SPI. Here the
> batch read depth comes into play. Since calls to the SPI cause some latency
> (i.e. network round trips), the SPI may - in addition to the actually
> requested item info - return additional item infos. The batch read depth
> parameter specifies the depth down to which item infos of the children of
> the requested item info are returned.
>
> Overall the size of the item info cache and the batch read depth should be
> used to optimize for the requirements of the back-end (i.e. network and
> server). In general, the item info cache should be large enough to *easily*
> hold all items from multiple batches. The batch read depth should be a trade
> off between network latency and item info cache overhead. Finally the item
> cache should be used to optimize for the requirements of the front-end (i.e.
> the JCR API client). It should be able to hold the items in the current
> working set of the API consumer.
>
> Some pointers:
>
> Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG
>
> Item info cache size:
> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE
>
> Item cache size:
> org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE
>
> Related JIRA issues:
> JCR-2497: Improve jcr2spi read performance
> JCR-2498: Implement caching mechanism for ItemInfo batches
> JCR-2461: Item retrieval inefficient after refresh
> JCR-2499: Add simple benchmarking tools for jcr2spi read perform
>
> Michael
>
> On 2/28/10 9:21 PM, Paco Avila wrote:
>>
>> El 28/02/2010 15:50, "Michael Dürig"<[email protected]>  escribió:
>>
>> François,
>>
>> I spent some time on improving performance lately. See
>> https://issues.apache.org/jira/browse/JCR-2497 and related issues.
>>
>> I was able to improve performance for our use case with these fixes.
>> Getting
>> the parameters right (i.e. item cache size, item info cache size and batch
>> read depth) is still quite tricky though and requires careful profiling.
>>
>> I can provide more specific information on these parameters if required.
>>
>> Michael
>>
>>
>>
>>
>>
>>
>> François Cassistat wrote:
>>>
>>> Ok, I've studied a little what was going on with a packet analyze...
>>
>



-- 
OpenKM
http://www.openkm.com
http://www.guia-ubuntu.org

Re: jcr2spi NodeIterator.getNode() performances

Reply via email to