Thanks for the real life examples. You would have to do a LOT of sharding to get that to work better.
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/10/10, Kent Fitch <kent.fi...@gmail.com> wrote: > From: Kent Fitch <kent.fi...@gmail.com> > Subject: Re: Solr and jvm Garbage Collection tuning > To: solr-user@lucene.apache.org > Date: Friday, September 10, 2010, 10:45 PM > Hi Tim, > > For what it is worth, behind Trove (http://trove.nla.gov.au/) are 3 > SOLR-managed indices and 1 Lucene index. None of ours is as > big as one > of your shards, and one of our SOLR-managed indices is > tiny, but your > experiences with long GC pauses are familar to us. > > One of the most difficult indices to tune is our > bibliographic index > of around 38M mostly metadata records which is around 125GB > and 97MB > tii files. > > We need to commit updates and reopen the index every 90 > seconds, and > the facet recalculation (using UnInverted) was taking quite > a lot of > time, and seemed to generate lots of objects to be > collected on each > reopening. > > Although we've been through several rounds of tuning which > have seemed > to work, at least temporarily, a few months ago we started > getting 12 > sec "full gc" times every 90 secs, which was no good! > > We've noticed/did three things: > > 1) optimise to 1 segment - we'd got to the stage where 50% > of the > documents had been updated (hence deleted), and the > maxdocid was 50% > bigger than it needed to be, and hence datastructures whose > size was > proportional to maxdocid had increased a lot. > Optimising to 1 segment > greatly reduced full GC frequency and times. > > 2) for most of our facets, forcing the facets to be filters > rather > than uninverted happened to work better - but this depends > on many > factors, and certainly isnt a cure-all for all facets - > uninverted > often works much better than filters! > > 3) after lots of benchmarking real updates and queries on a > dev > system, we came up with this set of JVM parameters that > worked "best" > for our environment (at the moment!): > > -Xmx17000M -XX:NewSize=3500M -XX:SurvivorRatio=3 > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC \ > -XX:+CMSIncrementalMode > > I can't say exactly why, except that with this combination > of > parameters and our data, a much bigger newgen led to less > movement of > objects to oldgen, and non-full-GC collections on oldgen > worked much > better. Currently we are seeing less than 10 Full > GC's a day, and > they almost always take less than 4 seconds. > > This index is running on an 8 core X5570 machine with 64GB, > sharing it > with a large/busy mysql instance and the Trove web server. > > One of our other indices is only updated once per day, but > is larger: > 33.5M docs representing full text of archived web pages, > 246GB, tii > file is 36MB. > > JVM parms are -Xmx10000M -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC. > > It also does less than 10 Full GC's per day, taking less > than 5 sec each. > > Our other large index, newspapers, is a native Lucene > index, about > 180GB with comparatively large tii of 280MB (probably for > the same > reason your tii is large - the contents of this database is > mostly > OCR'ed text). This index is updated/reopened every 3 > minutes (to > incorporate OCR text corrections and tagging) and we use a > bitmap to > represent all facet values, which typically take 5 secs to > rebuild on > each reopen. > > JVM parms: -mx15000M -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > > Although this JVM usually does fewer than 5 GC's per day, > these Full > GC's often take 20-30 seconds, and we need to test > increasing the > Newsize on this JVM to see if we can reduce these pauses. > > The web archive and newspaper index are running on 8 core > X5570 > machine with 72GB. > > We are also running a separate copy/version of this index > behind the > site http://newspapers.nla.gov.au/ - the main > difference is that the > Trove version using shingling (inspired by the Hathi Trust > results) to > improve searches containing common words. This other > version is > running on a machine with 32GB and 8 X5460 cores and > has JVM parms: > -mx11500M -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > > > Apart from the old newspapers index, all other SOLR/lucene > indices are > maintained on SSDs (Intel x25m 160GB), which whilst not > having > anything to do with GCs, work very very well - we couldnt > cope with > our current query volumes on rotating disk without spending > a great > deal of money. The old newspaper index is running on > a SAN with 24 > fast disks backing it, and we can't support the same query > rate on it > as we can with the other newspaper index on SSDs (even > before the > shingling change). > > Kent Fitch > Trove development team > National Library of Australia >