I think I am hot on the trail. I noticed this morning that the top objects in 
the heap dump are not just Lucene, they are classes related to query results. 
Due to a limitation in the Jackrabbit query language (specifically the 
inability to compare two dynamic dates) I am running a query that returns a 
result set proportional to the size of the repository (in other words it is 
unbounded). resultFetchSize is unlimited by default, so I think I am getting 
larger and larger query results until I run out of space.

I already changed this parameter yesterday, so I will see what happens with the 
testing today. In the bigger picture I'm working on a better way to mark and 
query the nodes I'm interested in so I don't have to perform an unbounded query.

Thanks again for the excellent support.

P.S. We build and run a standalone Sling jar - it runs separately from our main 
application.


-----Original Message-----
From: Ben Frisoni [mailto:[email protected]] 
Sent: Tuesday, November 24, 2015 11:05 AM
To: [email protected]
Subject: Re: Memory usage

So just as Clay has mentioned above, Jackrabbit does not hold the complete
Lucene index in memory. How it actually works is there is a VolatileIndex
which is memory. Any updates to the Lucene Index are first done here and
then are committed to the FileSystem based on the threshold parameters.
This was obviously implemented for performance reasons.
http://wiki.apache.org/jackrabbit/Search
Parameters:
1.

maxVolatileIndexSize

1048576

The maximum volatile index size in bytes until it is written to disk. The
default value is 1MB.

2.

volatileIdleTime

3

Idle time in seconds until the volatile index part is moved to a persistent
index even though minMergeDocs is not reached.

1GB is quite low. My company has ran for over two years a production
instance of Jackrabbit with 1 GB of memory and it has not had any issues.
The only time I saw huge spikes on memory consumption is on large
operations such as cloning a node with many descendants or querying a data
set with a 10k+ result size.

You said you have gathered a heap dump, this should point you in the
direction of what objects are consuming majority of the heap. This would be
a good start to see if it is jackrabbit causing the issue or your
application.
What type of deployment (
http://jackrabbit.apache.org/jcr/deployment-models.html) of jackrabbit are
you guys running? Is it completed isolated or embedded in your application?

On Mon, Nov 23, 2015 at 10:16 PM, Roll, Kevin <[email protected]> wrote:

> Hi, Ben. I was referring to the following page:
>
> https://jackrabbit.apache.org/jcr/search-implementation.html
>
> "The most recent generation of the search index is held completely in
> memory."
>
> Perhaps I am misreading this, or perhaps it is wrong, but I interpreted
> that to mean that the size of the index in memory would be proportional to
> the repository size. I hope this is not true!
>
> I am currently trying to get information from our QA team about the
> approximate number of nodes in the repository. We are not currently setting
> an explicit heap size - in the dumps I've examined it seems to run out
> around 240Mb. I'm pushing to set something explicit but I'm now hearing
> that older hardware has only 1Gb of memory, which gives us practically
> nowhere to go.
>
> The queries that I'm doing are not very fancy... for example: "select *
> from [nt:resource] where [jcr:mimeType] like 'image%%'". I'm actually
> rewriting that task so the query will be even simpler.
>
> Thanks for the help!
>
>
> [email protected]
> -----Original Message-----
> From: Ben Frisoni [mailto:[email protected]]
> Sent: Monday, November 23, 2015 5:21 PM
> To: [email protected]
> Subject: Re: Memory usage
>
> It is a good idea to turn off supportHighlighting especially if you aren't
> using the functionality. It takes up a lot of extra space within the index.
> I am not sure where you heard that the Lucene Index is kept in memory but I
> am pretty certain that is wrong. Can you point me to the documentation
> saying this?
>
> Also what data set sizes are you querying against (10k nodes ? 100k nodes?
> 1 mil nodes?).
> What heap size do you have set on the jvm?
> Reducing the resultFetchSize should help reduce the memory footprint on
> queries.
> I am assuming you are using the QueryManager to retrieve nodes. Can you
> give an example query that you are using?
>
> I have developed a patch to improve query performance on large data sets
> with jackrabbit 2.x. I should be done soon if I can gather together a few
> hours to finish up my work. If you would like you can give that a try once
> I finish.
>
> Some other repository settings you might want to look at are:
>  <PersistenceManager
>
> class="org.apache.jackrabbit.core.persistence.pool.DerbyPersistenceManager">
>       <param name="bundleCacheSize" value="256"/>
> </PersistenceManager>
>  <ISMLocking
> class="org.apache.jackrabbit.core.state.FineGrainedISMLocking"/>
>
>
> Hope this helps.
>
>

Reply via email to