Thanks Shawn.. yeah regular optimize might be the route we take, if this 
becomes a recurring issue.
 I remember in our old multicore deployment CPU used to spike and the core 
almost became non responsive. 

My guess with solr cloud architecture, any slack by leader while optimizing is 
picked up by the replica.
I was searching around for optimize behaviour of solr cloud and could not find 
much information.

Does anyone have experience running optimize for solr cloud in a loaded 
production env?

Thanks,
Rishi.
 
 

 

 

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org>
To: solr-user <solr-user@lucene.apache.org>
Sent: Mon, May 4, 2015 9:11 am
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


On 5/4/2015 4:55 AM, Rishi Easwaran wrote:
> Sadly with the size of our
complex, spiting and adding more HW is not a viable long term solution. 
>  I
guess the options we have are to run optimize regularly and/or become aggressive
in our merges proactively even before solr cloud gets into this situation.

If
you are regularly deleting most of your index, or reindexing large
parts of it,
which effectively does the same thing, then regular
optimizes may be required
to keep the index size down, although you must
remember that you need enough
room for the core to grow in order to
actually complete the optimize.  If the
core is 75-90 percent deleted
docs, then you will not need 2x the core size to
optimize it, because
the new index will be much smaller.

Currently,
SolrCloud will always optimize the entire collection when you
ask for an
optimize on any core, but it will NOT optimize all the
replicas (cores) at the
same time.  It will go through the cores that
make up the collection and
optimize each one one in sequence.  If your
index is sharded and replicated
enough, hopefully that will make it
possible for the optimize to complete even
though the amount of disk
space available may be low.

We have at least one
issue in Jira where users have asked for optimize
to honor distrib=false, which
would allow the user to be in complete
control of all optimizing, but so far
that hasn't been implemented.  The
volunteers that maintain Solr can only
accomplish so much in the limited
time they have
available.

Thanks,
Shawn


 

Reply via email to