In this case, optimising makes sense, once the index is generated, you are not updating It.
Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: > Our index has almost 100M documents running on SolrCloud of 5 shards and > each shard has an index size of about 170+GB (for the record, we are not > using stored fields - our documents are pretty large). We perform a full > indexing every weekend and during the week there are no updates made to > the > index. Most of the queries that we run are pretty complex with hundreds > of > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. > and take many minutes to execute. A difference of 10-20% is also a big > advantage for us. > > We have been optimizing the index after indexing for years and it has > worked well for us. Every once in a while, we upgrade Solr to the latest > version and try without optimizing so that we can save the many hours it > take to optimize such a huge index, but find optimized index work well > for > us. > > Erick I was indexing today the documents and saw the optimize happening > in > background. > > On Tue, May 26, 2015 at 9:12 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > No results yet. I finished the test harness last night (not really a > > unit test, a stand-alone program that endlessly adds stuff and tests > > that every commit returns the correct number of docs). > > > > 8,000 cycles later there aren't any problems reported. > > > > Siiigggggh. > > > > > > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather <modather1...@gmail.com> > > wrote: > > > Hi, > > > > > > Erick you mentioned about a unit test to test the optimize running in > > > background. Kindly share your findings if any. > > > > > > Thanks, > > > Modassar > > > > > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather <modather1...@gmail.com > > > > > > wrote: > > > > > >> Thanks everybody for your replies. > > >> > > >> I have noticed the optimization running in background every time I > > >> indexed. This is 5 node cluster with solr-5.1.0 and uses the > > >> CloudSolrClient. Kindly share your findings on this issue. > > >> > > >> Our index has almost 100M documents running on SolrCloud. We have been > > >> optimizing the index after indexing for years and it has worked well for > > >> us. > > >> > > >> Thanks, > > >> Modassar > > >> > > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson < > > erickerick...@gmail.com> > > >> wrote: > > >> > > >>> Actually, I've recently seen very similar behavior in Solr 4.10.3, but > > >>> involving hard commits openSearcher=true, see: > > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't > > >>> reproduce this at will, siigggghhhh. > > >>> > > >>> A unit test should be very simple to write though, maybe I can get to > > it > > >>> today. > > >>> > > >>> Erick > > >>> > > >>> > > >>> > > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira <u...@odoko.co.uk> wrote: > > >>> > > > >>> > > > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: > > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote: > > >>> >> > I am using Solr-5.1.0. I have an indexer class which invokes > > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer exits after > > the > > >>> >> > invocation of optimize and the optimization keeps on running in > > the > > >>> >> > background. > > >>> >> > Kindly let me know if it is per design and how can I make my > > indexer > > >>> to > > >>> >> > wait until the optimization is over. Is there a > > >>> configuration/parameter I > > >>> >> > need to set for the same. > > >>> >> > > > >>> >> > Please note that the same indexer with > > >>> cloudSolrServer.optimize(true, true, > > >>> >> > 1) on Solr-4.10 used to wait till the optimize was over before > > >>> exiting. > > >>> >> > > >>> >> This is very odd, because I could not get HttpSolrServer to > > optimize in > > >>> >> the background, even when that was what I wanted. > > >>> >> > > >>> >> I wondered if maybe the Cloud object behaves differently with > > regard to > > >>> >> blocking until an optimize is finished ... except that there is no > > code > > >>> >> for optimizing in CloudSolrClient at all ... so I don't know where > > the > > >>> >> different behavior would actually be happening. > > >>> > > > >>> > A more important question is, why are you optimising? Generally it > > isn't > > >>> > recommended anymore as it reduces the natural distribution of > > documents > > >>> > amongst segments and makes future merges more costly. > > >>> > > > >>> > Upayavira > > >>> > > >> > > >> > >