Re: Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?

Erick Erickson Sun, 30 Sep 2012 05:04:47 -0700

4.0 is significantly more efficient memory-wise, both in the usage and
number of objects allocated. See:


http://searchhub.org/dev/2012/04/06/memory-comparisons-between-solr-3x-and-trunk/

Erick

On Sun, Sep 30, 2012 at 12:25 AM, varun srivastava
<varunmail...@gmail.com> wrote:
> Hi Erick,
>  You mentioned for 4.0 memory pattern is much difference than 3.X . Can you
> elaborate whether its worse or better ? Does 4.0 tend to use more memory
> for similar index size as compared to 3.X ?
>
> Thanks
> Varun
>
> On Sat, Sep 29, 2012 at 1:58 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> Well, I haven't had experience with JDK7, so I'll skip that part...
>>
>> But about caches. First, as far as memory is concerned, be
>> sure to read Uwe's blog about MMapDirectory here:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> As to the caches.
>>
>> Be a little careful here. Getting high hit rates on _all_ your caches
>> is a waste.
>>
>> filterCache. This is the exception, you want as high a hit ratio as you can
>> get for this one, it's where the results of all the &fq= clauses go and is
>> a
>> major factor in speeding up QPS..
>>
>> queryResultCache. Hmmm, given the lack of updates to your index, this one
>> may actually get more hits than Id expect. But it's a very cheap cache
>> memory
>> wise. Think of it as a map where the key is the query and the value is an
>> array of <queryResultWindowSize> longs (document IDs). It's really intended
>> for paging mostly. It's also often the case that the chances of the exact
>> same query (except for &start and &rows) being issued is actually
>> relatively
>> small. As always YMMV. I usually see hit rates on this cache < 10%.
>> Evictions
>> merely mean it's been around a long time, bumping the size of this cache
>> probably won't affect the hit rate unless your app somehow submits just
>> a few queries.
>>
>>
>> documentCache. Again, this often doesn't have a great hit ration. It's main
>> use as I understand it is to keep various parts of a query component chain
>> from having to re-access the disk. Each element in a query component is
>> completely separate from the others, so if two or more components want
>> values from the doc, having them cached is useful. The usual recommendation
>> is (#docs returned to user) * (expected simultaneous queries), where
>> "# docs returned to user" is really the &rows value.
>>
>> One of the consequences of having huge amounts of memory allocated to
>> the JVM can be really long garbage collections. They happen less frequently
>> but have more work to do when they happen.
>>
>> Oh, and when you start using 4.0, the memory patterns are much different...
>>
>> Finally, here's a great post on solr memory tuning, too bad the image links
>> are broken...
>> http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/
>>
>> Best
>> Erick
>>
>> On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman <daub...@gmail.com> wrote:
>> > Greetings,
>> >
>> > I've recently moved to running some of our Solr (3.6.1) instances
>> > using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to
>> > 100ms range). By and large, it has been working well (or, perhaps I
>> > should say that without requiring much tuning it works much better in
>> > general than my haphazard attempts to tune CMS).
>> >
>> > I have two instances in particular, one with a heap size of 14G and
>> > one with a heap size of 60G. I'm attempting to squeeze out additional
>> > performance by increasing Solr's cache sizes (I am still seeing the
>> > hit ratio go up as I increase max size size and decrease the number of
>> > evictions), and am guessing this is the cause of some recent
>> > situations where the 14G instance especially eventually (12-24 hrs
>> > later under 100s of queries per minute) makes it to 80%-90% of the
>> > heap and then spirals into major GC with long-pause territory.
>> >
>> > I am wondering:
>> > 1) if anybody has experience tuning the G1 GC, especially for use with
>> > Solr (what are decent max-pause times to use?)
>> > 2) how to better tune Solr's cache sizes - e.g. how to even tell the
>> > actual amount of memory used by each cache (not # entries as the stats
>> > sow, but # bits)
>> > 3) if there are any guidelines on when increasing a cache's size (even
>> > if it does continue to increase the hit ratio) runs into the law of
>> > diminishing returns or even starts to hurt - e.g. if the document
>> > cache has a current maxSize of 65536 and has seen 4409275 evictions,
>> > and currently has a hit ratio of 0.74, should the max be increased
>> > further? If so, how much ram needs to be added to the heap, and how
>> > much larger should its max size be made?
>> >
>> > I should mention that these solr instances are read-only (so cache is
>> > probably more valuable than in other scenarios - we only invalidate
>> > the searcher every 20-24hrs or so) and are also backed with indexes
>> > (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as
>> > concerned about leaving RAM for linux to cache the index files (I'd
>> > much rather actually cache the post-transformed values).
>> >
>> > Thanks as always,
>> >      Aaron
>>

Re: Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?

Reply via email to