Thanks Shawn.. yeah regular optimize might be the route we take, if this becomes a recurring issue. I remember in our old multicore deployment CPU used to spike and the core almost became non responsive.
My guess with solr cloud architecture, any slack by leader while optimizing is picked up by the replica. I was searching around for optimize behaviour of solr cloud and could not find much information. Does anyone have experience running optimize for solr cloud in a loaded production env? Thanks, Rishi. -----Original Message----- From: Shawn Heisey <apa...@elyograg.org> To: solr-user <solr-user@lucene.apache.org> Sent: Mon, May 4, 2015 9:11 am Subject: Re: Solr Cloud reclaiming disk space from deleted documents On 5/4/2015 4:55 AM, Rishi Easwaran wrote: > Sadly with the size of our complex, spiting and adding more HW is not a viable long term solution. > I guess the options we have are to run optimize regularly and/or become aggressive in our merges proactively even before solr cloud gets into this situation. If you are regularly deleting most of your index, or reindexing large parts of it, which effectively does the same thing, then regular optimizes may be required to keep the index size down, although you must remember that you need enough room for the core to grow in order to actually complete the optimize. If the core is 75-90 percent deleted docs, then you will not need 2x the core size to optimize it, because the new index will be much smaller. Currently, SolrCloud will always optimize the entire collection when you ask for an optimize on any core, but it will NOT optimize all the replicas (cores) at the same time. It will go through the cores that make up the collection and optimize each one one in sequence. If your index is sharded and replicated enough, hopefully that will make it possible for the optimize to complete even though the amount of disk space available may be low. We have at least one issue in Jira where users have asked for optimize to honor distrib=false, which would allow the user to be in complete control of all optimizing, but so far that hasn't been implemented. The volunteers that maintain Solr can only accomplish so much in the limited time they have available. Thanks, Shawn