Hi Erick, You mentioned for 4.0 memory pattern is much difference than 3.X . Can you elaborate whether its worse or better ? Does 4.0 tend to use more memory for similar index size as compared to 3.X ?
Thanks Varun On Sat, Sep 29, 2012 at 1:58 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Well, I haven't had experience with JDK7, so I'll skip that part... > > But about caches. First, as far as memory is concerned, be > sure to read Uwe's blog about MMapDirectory here: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > As to the caches. > > Be a little careful here. Getting high hit rates on _all_ your caches > is a waste. > > filterCache. This is the exception, you want as high a hit ratio as you can > get for this one, it's where the results of all the &fq= clauses go and is > a > major factor in speeding up QPS.. > > queryResultCache. Hmmm, given the lack of updates to your index, this one > may actually get more hits than Id expect. But it's a very cheap cache > memory > wise. Think of it as a map where the key is the query and the value is an > array of <queryResultWindowSize> longs (document IDs). It's really intended > for paging mostly. It's also often the case that the chances of the exact > same query (except for &start and &rows) being issued is actually > relatively > small. As always YMMV. I usually see hit rates on this cache < 10%. > Evictions > merely mean it's been around a long time, bumping the size of this cache > probably won't affect the hit rate unless your app somehow submits just > a few queries. > > > documentCache. Again, this often doesn't have a great hit ration. It's main > use as I understand it is to keep various parts of a query component chain > from having to re-access the disk. Each element in a query component is > completely separate from the others, so if two or more components want > values from the doc, having them cached is useful. The usual recommendation > is (#docs returned to user) * (expected simultaneous queries), where > "# docs returned to user" is really the &rows value. > > One of the consequences of having huge amounts of memory allocated to > the JVM can be really long garbage collections. They happen less frequently > but have more work to do when they happen. > > Oh, and when you start using 4.0, the memory patterns are much different... > > Finally, here's a great post on solr memory tuning, too bad the image links > are broken... > http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/ > > Best > Erick > > On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman <daub...@gmail.com> wrote: > > Greetings, > > > > I've recently moved to running some of our Solr (3.6.1) instances > > using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to > > 100ms range). By and large, it has been working well (or, perhaps I > > should say that without requiring much tuning it works much better in > > general than my haphazard attempts to tune CMS). > > > > I have two instances in particular, one with a heap size of 14G and > > one with a heap size of 60G. I'm attempting to squeeze out additional > > performance by increasing Solr's cache sizes (I am still seeing the > > hit ratio go up as I increase max size size and decrease the number of > > evictions), and am guessing this is the cause of some recent > > situations where the 14G instance especially eventually (12-24 hrs > > later under 100s of queries per minute) makes it to 80%-90% of the > > heap and then spirals into major GC with long-pause territory. > > > > I am wondering: > > 1) if anybody has experience tuning the G1 GC, especially for use with > > Solr (what are decent max-pause times to use?) > > 2) how to better tune Solr's cache sizes - e.g. how to even tell the > > actual amount of memory used by each cache (not # entries as the stats > > sow, but # bits) > > 3) if there are any guidelines on when increasing a cache's size (even > > if it does continue to increase the hit ratio) runs into the law of > > diminishing returns or even starts to hurt - e.g. if the document > > cache has a current maxSize of 65536 and has seen 4409275 evictions, > > and currently has a hit ratio of 0.74, should the max be increased > > further? If so, how much ram needs to be added to the heap, and how > > much larger should its max size be made? > > > > I should mention that these solr instances are read-only (so cache is > > probably more valuable than in other scenarios - we only invalidate > > the searcher every 20-24hrs or so) and are also backed with indexes > > (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as > > concerned about leaving RAM for linux to cache the index files (I'd > > much rather actually cache the post-transformed values). > > > > Thanks as always, > > Aaron >