Alright, thanks Erick. For the question about memory usage of merges, taken from Mike McCandless Blog
The big thing that stays in RAM is a logical int[] mapping old docIDs to new docIDs, but in more recent versions of Lucene (4.x) we use a much more efficient structure than a simple int[] ... see https://issues.apache.org/jira/browse/LUCENE-2357 How much RAM is required is mostly a function of how many documents (lots of tiny docs use more RAM than fewer huge docs). A related clarification As my users are not aware of the fq possibility, i was wondering how do I make the best out of this field cache. Would if be efficient transforming implicitly their query to a filter query on fields that are boolean searches (date range etc. that do not affect the score of a document). Is this a good practice? Is there any plugin for a query parser that makes it? > > Inline > > On Thu, Jul 11, 2013 at 8:36 AM, Manuel Le Normand > <manuel.lenorm...@gmail.com> wrote: > > Hello, > > As a result of frequent java OOM exceptions, I try to investigate more into > > the solr jvm memory heap usage. > > Please correct me if I am mistaking, this is my understanding of usages for > > the heap (per replica on a solr instance): > > 1. Buffers for indexing - bounded by ramBufferSize > > 2. Solr caches > > 3. Segment merge > > 4. Miscellaneous- buffers for Tlogs, servlet overhead etc. > > > > Particularly I'm concerned by Solr caches and segment merges. > > 1. How much memory consuming (bytes per doc) are FilterCaches (bitDocSet) > > and queryResultCaches (DocList)? I understand it is related to the skip > > spaces between doc id's that match (so it's not saved as a bitmap). But > > basically, is every id saved as a java int? > > Different beasts. filterCache consumes, essentially, maxDoc/8 bytes (you > can get the maxDoc number from your Solr admin page). Plus some overhead > for storing the fq text, but that's usually not much. This is for each > entry up to "Size". > > queryResultCache is usually trivial unless you've configured it extravagantly. > It's the query string length + queryResultWindowSize integers per entry > (queryResultWindowSize is from solrconfig.xml). > > > 2. QueryResultMaxDocsCached - (for example = 100) means that any query > > resulting in more than 100 docs will not be cached (at all) in the > > queryResultCache? Or does it have to do with the documentCache? > It's just a limit on the queryResultCache entry size as far as I can > tell. But again > this cache is relatively small, I'd be surprised if it used > significant resources. > > > 3. DocumentCache - written on the wiki it should be greater than > > max_results*concurrent_queries. Max result is just the num of rows > > displayed (rows-start) param, right? Not the queryResultWindow. > > Yes. This a cache (I think) for the _contents_ of the documents you'll > be returning to be manipulated by various components during the life > of the query. > > > 4. LazyFieldLoading=true - when quering for id's only (fl=id) will this > > cache be used? (on the expense of eviction of docs that were already loaded > > with stored fields) > > Not sure, but I don't think this will contribute much to memory pressure. This > is about now many fields are loaded to get a single value from a doc in the > results list, and since one is usually working with 20 or so docs this > is usually > a small amount of memory. > > > 5. How large is the heap used by mergings? Assuming we have a merge of 10 > > segments of 500MB each (half inverted files - *.pos *.doc etc, half non > > inverted files - *.fdt, *.tvd), how much heap should be left unused for > > this merge? > > Again, I don't think this is much of a memory consumer, although I > confess I don't > know the internals. Merging is mostly about I/O. > > > > > Thanks in advance, > > Manu > > But take a look at the admin page, you can see how much memory various > caches are using by looking at the plugins/stats section. > > Best > Erick