Re: Solr Cloud reclaiming disk space from deleted documents

Shawn Heisey Mon, 04 May 2015 06:12:08 -0700

On 5/4/2015 4:55 AM, Rishi Easwaran wrote:
> Sadly with the size of our complex, spiting and adding more HW is not a 
> viable long term solution. 
>  I guess the options we have are to run optimize regularly and/or become 
> aggressive in our merges proactively even before solr cloud gets into this 
> situation.


If you are regularly deleting most of your index, or reindexing large
parts of it, which effectively does the same thing, then regular
optimizes may be required to keep the index size down, although you must
remember that you need enough room for the core to grow in order to
actually complete the optimize.  If the core is 75-90 percent deleted
docs, then you will not need 2x the core size to optimize it, because
the new index will be much smaller.

Currently, SolrCloud will always optimize the entire collection when you
ask for an optimize on any core, but it will NOT optimize all the
replicas (cores) at the same time.  It will go through the cores that
make up the collection and optimize each one one in sequence.  If your
index is sharded and replicated enough, hopefully that will make it
possible for the optimize to complete even though the amount of disk
space available may be low.

We have at least one issue in Jira where users have asked for optimize
to honor distrib=false, which would allow the user to be in complete
control of all optimizing, but so far that hasn't been implemented.  The
volunteers that maintain Solr can only accomplish so much in the limited
time they have available.

Thanks,
Shawn

Re: Solr Cloud reclaiming disk space from deleted documents

Reply via email to