Thanks for the real life examples.

You would have to do a LOT of sharding to get that to work better.


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 9/10/10, Kent Fitch <kent.fi...@gmail.com> wrote:

> From: Kent Fitch <kent.fi...@gmail.com>
> Subject: Re: Solr and jvm Garbage Collection tuning
> To: solr-user@lucene.apache.org
> Date: Friday, September 10, 2010, 10:45 PM
> Hi Tim,
> 
> For what it is worth,  behind Trove (http://trove.nla.gov.au/) are 3
> SOLR-managed indices and 1 Lucene index. None of ours is as
> big as one
> of your shards, and one of our SOLR-managed indices is
> tiny, but your
> experiences with long GC pauses are familar to us.
> 
> One of the most difficult indices to tune is our
> bibliographic index
> of around 38M mostly metadata records which is around 125GB
> and 97MB
> tii files.
> 
> We need to commit updates and reopen the index every 90
> seconds, and
> the facet recalculation (using UnInverted) was taking quite
> a lot of
> time, and seemed to generate lots of objects to be
> collected on each
> reopening.
> 
> Although we've been through several rounds of tuning which
> have seemed
> to work, at least temporarily, a few months ago we started
> getting 12
> sec "full gc" times every 90 secs, which was no good!
> 
> We've noticed/did three things:
> 
> 1) optimise to 1 segment - we'd got to the stage where 50%
> of the
> documents had been updated (hence deleted), and the
> maxdocid was 50%
> bigger than it needed to be, and hence datastructures whose
> size was
> proportional to maxdocid had increased a lot. 
> Optimising to 1 segment
> greatly reduced full GC frequency and times.
> 
> 2) for most of our facets, forcing the facets to be filters
> rather
> than uninverted happened to work better - but this depends
> on many
> factors, and certainly isnt a cure-all for all facets -
> uninverted
> often works much better than filters!
> 
> 3) after lots of benchmarking real updates and queries on a
> dev
> system, we came up with this set of JVM parameters that
> worked "best"
> for our environment (at the moment!):
> 
> -Xmx17000M -XX:NewSize=3500M -XX:SurvivorRatio=3
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC \
> -XX:+CMSIncrementalMode
> 
> I can't say exactly why, except that with this combination
> of
> parameters and our data, a much bigger newgen led to less
> movement of
> objects to oldgen, and non-full-GC collections on oldgen
> worked much
> better.  Currently we are seeing less than 10 Full
> GC's a day, and
> they almost always take less than 4 seconds.
> 
> This index is running on an 8 core X5570 machine with 64GB,
> sharing it
> with a large/busy mysql instance and the Trove web server.
> 
> One of our other indices is only updated once per day, but
> is larger:
> 33.5M docs representing full text of archived web pages,
> 246GB, tii
> file is 36MB.
> 
> JVM parms are  -Xmx10000M -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC.
> 
> It also does less than 10 Full GC's per day, taking less
> than 5 sec each.
> 
> Our other large index, newspapers, is a native Lucene
> index, about
> 180GB with comparatively large tii of 280MB (probably for
> the same
> reason your tii is large - the contents of this database is
> mostly
> OCR'ed text).  This index is updated/reopened every 3
> minutes (to
> incorporate OCR text corrections and tagging) and we use a
> bitmap to
> represent all facet values, which typically take 5 secs to
> rebuild on
> each reopen.
> 
> JVM parms: -mx15000M -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> 
> Although this JVM usually does fewer than 5 GC's per day,
> these Full
> GC's often take 20-30 seconds, and we need to test
> increasing the
> Newsize on this JVM to see if we can reduce these pauses.
> 
> The web archive and newspaper index are running on 8 core
> X5570
> machine with 72GB.
> 
> We are also running a separate copy/version of this index
> behind the
> site  http://newspapers.nla.gov.au/ - the main
> difference is that the
> Trove version using shingling (inspired by the Hathi Trust
> results) to
> improve searches containing common words.  This other
> version is
> running on a machine with 32GB and 8 X5460 cores and 
> has JVM parms:
>   -mx11500M  -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> 
> 
> Apart from the old newspapers index, all other SOLR/lucene
> indices are
> maintained on SSDs (Intel x25m 160GB), which whilst not
> having
> anything to do with GCs, work very very well - we couldnt
> cope with
> our current query volumes on rotating disk without spending
> a great
> deal of money.  The old newspaper index is running on
> a SAN with 24
> fast disks backing it, and we can't support the same query
> rate on it
> as we can with the other newspaper index on SSDs (even
> before the
> shingling change).
> 
> Kent Fitch
> Trove development team
> National Library of Australia
>

Reply via email to