Alright, thanks Erick. For the question about memory usage of merges, taken
from  Mike McCandless Blog

The big thing that stays in RAM is a logical int[] mapping old docIDs to
new docIDs, but in more recent versions of Lucene (4.x) we use a much more
efficient structure than a simple int[] ... see
https://issues.apache.org/jira/browse/LUCENE-2357

How much RAM is required is mostly a function of how many documents (lots
of tiny docs use more RAM than fewer huge docs).


A related clarification
As my users are not aware of the fq possibility, i was wondering how do I
make the best out of this field cache. Would if be efficient transforming
implicitly their query to a filter query on fields that are boolean
searches (date range etc. that do not affect the score of a document). Is
this a good practice? Is there any plugin for a query parser that makes it?



>
> Inline
>
> On Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand
> <manuel.lenorm...@gmail.com> wrote:
> > Hello,
> > As a result of frequent java OOM exceptions, I try to investigate more
into
> > the solr jvm memory heap usage.
> > Please correct me if I am mistaking, this is my understanding of usages
for
> > the heap (per replica on a solr instance):
> > 1. Buffers for indexing - bounded by ramBufferSize
> > 2. Solr caches
> > 3. Segment merge
> > 4. Miscellaneous- buffers for Tlogs, servlet overhead etc.
> >
> > Particularly I'm concerned by Solr caches and segment merges.
> > 1. How much memory consuming (bytes per doc) are FilterCaches
(bitDocSet)
> > and queryResultCaches (DocList)? I understand it is related to the skip
> > spaces between doc id's that match (so it's not saved as a bitmap). But
> > basically, is every id saved as a java int?
>
> Different beasts. filterCache consumes, essentially, maxDoc/8 bytes (you
> can get the maxDoc number from your Solr admin page). Plus some overhead
> for storing the fq text, but that's usually not much. This is for each
> entry up to "Size".



>
> queryResultCache is usually trivial unless you've configured it
extravagantly.
> It's the query string length + queryResultWindowSize integers per entry
> (queryResultWindowSize is from solrconfig.xml).
>
> > 2. QueryResultMaxDocsCached - (for example = 100) means that any query
> > resulting in more than 100 docs will not be cached (at all) in the
> > queryResultCache? Or does it have to do with the documentCache?
> It's just a limit on the queryResultCache entry size as far as I can
> tell. But again
> this cache is relatively small, I'd be surprised if it used
> significant resources.
>
> > 3. DocumentCache - written on the wiki it should be greater than
> > max_results*concurrent_queries. Max result is just the num of rows
> > displayed (rows-start) param, right? Not the queryResultWindow.
>
> Yes. This a cache (I think) for the _contents_ of the documents you'll
> be returning to be manipulated by various components during the life
> of the query.
>
> > 4. LazyFieldLoading=true - when quering for id's only (fl=id) will this
> > cache be used? (on the expense of eviction of docs that were already
loaded
> > with stored fields)
>
> Not sure, but I don't think this will contribute much to memory pressure.
This
> is about now many fields are loaded to get a single value from a doc in
the
> results list, and since one is usually working with 20 or so docs this
> is usually
> a small amount of memory.
>
> > 5. How large is the heap used by mergings? Assuming we have a merge of
10
> > segments of 500MB each (half inverted files - *.pos *.doc etc, half non
> > inverted files - *.fdt, *.tvd), how much heap should be left unused for
> > this merge?
>
> Again, I don't think this is much of a memory consumer, although I
> confess I don't
> know the internals. Merging is mostly about I/O.
>
> >
> > Thanks in advance,
> > Manu
>
> But take a look at the admin page, you can see how much memory various
> caches are using by looking at the plugins/stats section.
>
> Best
> Erick

Reply via email to