Re: jcr2spi NodeIterator.getNode() performances

Michael Dürig Thu, 04 Mar 2010 08:48:49 -0800


On 3/4/10 4:55 PM, Paco Avila wrote:

Thanks for the info :)

PD: This info should be included in the Wiki.


Yes, I see what I can do.
Michael



On Thu, Mar 4, 2010 at 2:30 PM, Michael Dürig<[email protected]>  wrote:

I am interested on these parameters to improve jackrabbit performance. I
have an installation with more than 2 million of documents and performance
is actually poor :(


On the current trunk there are 3 parameters which can be used to tweak
performance for jcr2spi/spi2davex. These are the size of the item info
cache, the size of the item cache and the depth of batch read operations.


Some Background:
The item cache contains JCR items (i.e. nodes and properties). The item info
cache contains item infos. An item info is an entity representing nodes or
properties on the SPI layer. The jcr2spi module receives item infos from an
SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of
JCR items.

When an item is requested from the JCR API, jcr2spi first checks whether the
item is in the item cache. If so, that item is returned. If not, the request
is passed down to the SPI. But before actually calling the SPI the item info
cache is check first. If this cache contains the requested item info the
relevant part of the JCR hierarchy is build and the corresponding JCR item
is placed into the item cache. Only when the item info cache does not
contain the requested item info a call will be made to the SPI. Here the
batch read depth comes into play. Since calls to the SPI cause some latency
(i.e. network round trips), the SPI may - in addition to the actually
requested item info - return additional item infos. The batch read depth
parameter specifies the depth down to which item infos of the children of
the requested item info are returned.

Overall the size of the item info cache and the batch read depth should be
used to optimize for the requirements of the back-end (i.e. network and
server). In general, the item info cache should be large enough to *easily*
hold all items from multiple batches. The batch read depth should be a trade
off between network latency and item info cache overhead. Finally the item
cache should be used to optimize for the requirements of the front-end (i.e.
the JCR API client). It should be able to hold the items in the current
working set of the API consumer.

Some pointers:

Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG

Item info cache size:
org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE

Item cache size:
org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE

Related JIRA issues:
JCR-2497: Improve jcr2spi read performance
JCR-2498: Implement caching mechanism for ItemInfo batches
JCR-2461: Item retrieval inefficient after refresh
JCR-2499: Add simple benchmarking tools for jcr2spi read perform

Michael

On 2/28/10 9:21 PM, Paco Avila wrote:


El 28/02/2010 15:50, "Michael Dürig"<[email protected]>    escribió:

François,

I spent some time on improving performance lately. See
https://issues.apache.org/jira/browse/JCR-2497 and related issues.

I was able to improve performance for our use case with these fixes.
Getting
the parameters right (i.e. item cache size, item info cache size and batch
read depth) is still quite tricky though and requires careful profiling.

I can provide more specific information on these parameters if required.

Michael






François Cassistat wrote:


Ok, I've studied a little what was going on with a packet analyze...

Re: jcr2spi NodeIterator.getNode() performances

Reply via email to