Ok. I will try with the "concurrent low pause" collector and let you know the results. On Fri, Sep 25, 2009 at 2:23 PM, Walter Underwood <wun...@wunderwood.org>wrote:
> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low > pause" collector is only in the Sun JVM. > > I just found this excellent article about the various IBM GC options for a > Lucene application with a 100GB heap: > > > http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large > _h.html > > wunder > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, September 25, 2009 10:03 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > Walter Underwood wrote: > > 30ms is not better or worse than 1s until you look at the service > > requirements. For many applications, it is worth dedicating 10% of your > > processing time to GC if that makes the worst-case pause short. > > > > On the other hand, my experience with the IBM JVM was that the maximum > query > > rate was 2-3X better with the concurrent generational GC compared to any > of > > their other GC algorithms, so we got the best throughput along with the > > shortest pauses. > > > With which collector? Since the very early JVM's, all GC is generational. > Most of the collectors (other than the Serial Collector) also work > concurrently. > By default, they are concurrent on different generations, but you can > add concurrency > to the "other" generation with each now too. > > Solr garbage generation (for queries) seems to have two major components: > > per-request garbage and cache evictions. With a generational collector, > > these two are handled by separate parts of the collector. > Different parts of the collector? Its a different collector depending on > the generation. > The young generation is collected with a copy collector. This is because > almost all the objects > in the young generation are likely dead, and a copy collector only needs > to visit live objects. So > its very efficient. The tenured generation uses something more along the > lines of mark and sweep or mark > and compact. > > Per-request > > garbage should completely fit in the short-term heap (nursery), so that > it > > can be collected rapidly and returned to use for further requests. If the > > nursery is too small, the per-request allocations will be made in tenured > > space and sit there until the next major GC. Cache evictions are almost > > always in long-term storage (tenured space) because an LRU algorithm > > guarantees that the garbage will be old. > > > > Check the growth rate of tenured space (under constant load, of course) > > while increasing the size of the nursery. That rate should drop when the > > nursery gets big enough, then not drop much further as it is increased > more. > > > > After that, reduce the size of tenured space until major GCs start > happening > > "too often" (a judgment call). A bigger tenured space means longer major > GCs > > and thus longer pauses, so you don't want it oversized by too much. > > > With the concurrent low pause collector, the goal is to avoid "major" > collections, > by collecting *before* the tenured space is filled. If you you are > getting "major" collections, > you need to tune your settings - the whole point of that collector is to > avoid "major" > collections, and do almost all of the work while your application is not > paused. There are > still 2 brief pauses during the collection, but they should not be > significant at all. > > Also check the hit rates of your caches. If the hit rate is low, say 20% > or > > less, make that cache much bigger or set it to zero. Either one will > reduce > > the number of cache evictions. If you have an HTTP cache in front of > Solr, > > zero may be the right choice, since the HTTP cache is cherry-picking the > > easily cacheable requests. > > > > Note that a commit nearly doubles the memory required, because you have > two > > live Searcher objects with all their caches. Make sure you have headroom > for > > a commit. > > > > If you want to test the tenured space usage, you must test with real > world > > queries. Those are the only way to get accurate cache eviction rates. > > > > wunder > > > > -----Original Message----- > > From: Jonathan Ariel [mailto:ionat...@gmail.com] > > Sent: Friday, September 25, 2009 9:34 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr and Garbage Collection > > > > BTW why making them equal will lower the frequency of GC? > > > > On 9/25/09, Fuad Efendi <f...@efendi.ca> wrote: > > > >>> Bigger heaps lead to bigger GC pauses in general. > >>> > >> Opposite viewpoint: > >> 1sec GC happening once an hour is MUCH BETTER than 30ms GC > >> > > once-per-second. > > > >> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) > >> > >> Use -server option. > >> > >> -server option of JVM is 'native CPU code', I remember WebLogic 7 > console > >> with SUN JVM 1.3 not showing any GC (just horizontal line). > >> > >> -Fuad > >> http://www.linkedin.com/in/liferay > >> > >> > >> > >> > >> > > > > > > > > > -- > - Mark > > http://www.lucidimagination.com > > > > >