4.0 is significantly more efficient memory-wise, both in the usage and number of objects allocated. See:
http://searchhub.org/dev/2012/04/06/memory-comparisons-between-solr-3x-and-trunk/ Erick On Sun, Sep 30, 2012 at 12:25 AM, varun srivastava <varunmail...@gmail.com> wrote: > Hi Erick, > You mentioned for 4.0 memory pattern is much difference than 3.X . Can you > elaborate whether its worse or better ? Does 4.0 tend to use more memory > for similar index size as compared to 3.X ? > > Thanks > Varun > > On Sat, Sep 29, 2012 at 1:58 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> Well, I haven't had experience with JDK7, so I'll skip that part... >> >> But about caches. First, as far as memory is concerned, be >> sure to read Uwe's blog about MMapDirectory here: >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html >> >> As to the caches. >> >> Be a little careful here. Getting high hit rates on _all_ your caches >> is a waste. >> >> filterCache. This is the exception, you want as high a hit ratio as you can >> get for this one, it's where the results of all the &fq= clauses go and is >> a >> major factor in speeding up QPS.. >> >> queryResultCache. Hmmm, given the lack of updates to your index, this one >> may actually get more hits than Id expect. But it's a very cheap cache >> memory >> wise. Think of it as a map where the key is the query and the value is an >> array of <queryResultWindowSize> longs (document IDs). It's really intended >> for paging mostly. It's also often the case that the chances of the exact >> same query (except for &start and &rows) being issued is actually >> relatively >> small. As always YMMV. I usually see hit rates on this cache < 10%. >> Evictions >> merely mean it's been around a long time, bumping the size of this cache >> probably won't affect the hit rate unless your app somehow submits just >> a few queries. >> >> >> documentCache. Again, this often doesn't have a great hit ration. It's main >> use as I understand it is to keep various parts of a query component chain >> from having to re-access the disk. Each element in a query component is >> completely separate from the others, so if two or more components want >> values from the doc, having them cached is useful. The usual recommendation >> is (#docs returned to user) * (expected simultaneous queries), where >> "# docs returned to user" is really the &rows value. >> >> One of the consequences of having huge amounts of memory allocated to >> the JVM can be really long garbage collections. They happen less frequently >> but have more work to do when they happen. >> >> Oh, and when you start using 4.0, the memory patterns are much different... >> >> Finally, here's a great post on solr memory tuning, too bad the image links >> are broken... >> http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/ >> >> Best >> Erick >> >> On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman <daub...@gmail.com> wrote: >> > Greetings, >> > >> > I've recently moved to running some of our Solr (3.6.1) instances >> > using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to >> > 100ms range). By and large, it has been working well (or, perhaps I >> > should say that without requiring much tuning it works much better in >> > general than my haphazard attempts to tune CMS). >> > >> > I have two instances in particular, one with a heap size of 14G and >> > one with a heap size of 60G. I'm attempting to squeeze out additional >> > performance by increasing Solr's cache sizes (I am still seeing the >> > hit ratio go up as I increase max size size and decrease the number of >> > evictions), and am guessing this is the cause of some recent >> > situations where the 14G instance especially eventually (12-24 hrs >> > later under 100s of queries per minute) makes it to 80%-90% of the >> > heap and then spirals into major GC with long-pause territory. >> > >> > I am wondering: >> > 1) if anybody has experience tuning the G1 GC, especially for use with >> > Solr (what are decent max-pause times to use?) >> > 2) how to better tune Solr's cache sizes - e.g. how to even tell the >> > actual amount of memory used by each cache (not # entries as the stats >> > sow, but # bits) >> > 3) if there are any guidelines on when increasing a cache's size (even >> > if it does continue to increase the hit ratio) runs into the law of >> > diminishing returns or even starts to hurt - e.g. if the document >> > cache has a current maxSize of 65536 and has seen 4409275 evictions, >> > and currently has a hit ratio of 0.74, should the max be increased >> > further? If so, how much ram needs to be added to the heap, and how >> > much larger should its max size be made? >> > >> > I should mention that these solr instances are read-only (so cache is >> > probably more valuable than in other scenarios - we only invalidate >> > the searcher every 20-24hrs or so) and are also backed with indexes >> > (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as >> > concerned about leaving RAM for linux to cache the index files (I'd >> > much rather actually cache the post-transformed values). >> > >> > Thanks as always, >> > Aaron >>