Hi Erick,
 You mentioned for 4.0 memory pattern is much difference than 3.X . Can you
elaborate whether its worse or better ? Does 4.0 tend to use more memory
for similar index size as compared to 3.X ?

Thanks
Varun

On Sat, Sep 29, 2012 at 1:58 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Well, I haven't had experience with JDK7, so I'll skip that part...
>
> But about caches. First, as far as memory is concerned, be
> sure to read Uwe's blog about MMapDirectory here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> As to the caches.
>
> Be a little careful here. Getting high hit rates on _all_ your caches
> is a waste.
>
> filterCache. This is the exception, you want as high a hit ratio as you can
> get for this one, it's where the results of all the &fq= clauses go and is
> a
> major factor in speeding up QPS..
>
> queryResultCache. Hmmm, given the lack of updates to your index, this one
> may actually get more hits than Id expect. But it's a very cheap cache
> memory
> wise. Think of it as a map where the key is the query and the value is an
> array of <queryResultWindowSize> longs (document IDs). It's really intended
> for paging mostly. It's also often the case that the chances of the exact
> same query (except for &start and &rows) being issued is actually
> relatively
> small. As always YMMV. I usually see hit rates on this cache < 10%.
> Evictions
> merely mean it's been around a long time, bumping the size of this cache
> probably won't affect the hit rate unless your app somehow submits just
> a few queries.
>
>
> documentCache. Again, this often doesn't have a great hit ration. It's main
> use as I understand it is to keep various parts of a query component chain
> from having to re-access the disk. Each element in a query component is
> completely separate from the others, so if two or more components want
> values from the doc, having them cached is useful. The usual recommendation
> is (#docs returned to user) * (expected simultaneous queries), where
> "# docs returned to user" is really the &rows value.
>
> One of the consequences of having huge amounts of memory allocated to
> the JVM can be really long garbage collections. They happen less frequently
> but have more work to do when they happen.
>
> Oh, and when you start using 4.0, the memory patterns are much different...
>
> Finally, here's a great post on solr memory tuning, too bad the image links
> are broken...
> http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/
>
> Best
> Erick
>
> On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman <daub...@gmail.com> wrote:
> > Greetings,
> >
> > I've recently moved to running some of our Solr (3.6.1) instances
> > using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to
> > 100ms range). By and large, it has been working well (or, perhaps I
> > should say that without requiring much tuning it works much better in
> > general than my haphazard attempts to tune CMS).
> >
> > I have two instances in particular, one with a heap size of 14G and
> > one with a heap size of 60G. I'm attempting to squeeze out additional
> > performance by increasing Solr's cache sizes (I am still seeing the
> > hit ratio go up as I increase max size size and decrease the number of
> > evictions), and am guessing this is the cause of some recent
> > situations where the 14G instance especially eventually (12-24 hrs
> > later under 100s of queries per minute) makes it to 80%-90% of the
> > heap and then spirals into major GC with long-pause territory.
> >
> > I am wondering:
> > 1) if anybody has experience tuning the G1 GC, especially for use with
> > Solr (what are decent max-pause times to use?)
> > 2) how to better tune Solr's cache sizes - e.g. how to even tell the
> > actual amount of memory used by each cache (not # entries as the stats
> > sow, but # bits)
> > 3) if there are any guidelines on when increasing a cache's size (even
> > if it does continue to increase the hit ratio) runs into the law of
> > diminishing returns or even starts to hurt - e.g. if the document
> > cache has a current maxSize of 65536 and has seen 4409275 evictions,
> > and currently has a hit ratio of 0.74, should the max be increased
> > further? If so, how much ram needs to be added to the heap, and how
> > much larger should its max size be made?
> >
> > I should mention that these solr instances are read-only (so cache is
> > probably more valuable than in other scenarios - we only invalidate
> > the searcher every 20-24hrs or so) and are also backed with indexes
> > (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as
> > concerned about leaving RAM for linux to cache the index files (I'd
> > much rather actually cache the post-transformed values).
> >
> > Thanks as always,
> >      Aaron
>

Reply via email to