Our index has almost 100M documents running on SolrCloud of 5 shards and
each shard has an index size of about 170+GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but find optimized index work well for
us.

Erick I was indexing today the documents and saw the optimize happening in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> No results yet. I finished the test harness last night (not really a
> unit test, a stand-alone program that endlessly adds stuff and tests
> that every commit returns the correct number of docs).
>
> 8,000 cycles later there aren't any problems reported.
>
> Siiigggggh.
>
>
> On Tue, May 26, 2015 at 1:51 AM, Modassar Ather <modather1...@gmail.com>
> wrote:
> > Hi,
> >
> > Erick you mentioned about a unit test to test the optimize running in
> > background. Kindly share your findings if any.
> >
> > Thanks,
> > Modassar
> >
> > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather <modather1...@gmail.com
> >
> > wrote:
> >
> >> Thanks everybody for your replies.
> >>
> >> I have noticed the optimization running in background every time I
> >> indexed. This is 5 node cluster with solr-5.1.0 and uses the
> >> CloudSolrClient. Kindly share your findings on this issue.
> >>
> >> Our index has almost 100M documents running on SolrCloud. We have been
> >> optimizing the index after indexing for years and it has worked well for
> >> us.
> >>
> >> Thanks,
> >> Modassar
> >>
> >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> Actually, I've recently seen very similar behavior in Solr 4.10.3, but
> >>> involving hard commits openSearcher=true, see:
> >>> https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
> >>> reproduce this at will, siigggghhhh.
> >>>
> >>> A unit test should be very simple to write though, maybe I can get to
> it
> >>> today.
> >>>
> >>> Erick
> >>>
> >>>
> >>>
> >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira <u...@odoko.co.uk> wrote:
> >>> >
> >>> >
> >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
> >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote:
> >>> >> > I am using Solr-5.1.0. I have an indexer class which invokes
> >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer exits after
> the
> >>> >> > invocation of optimize and the optimization keeps on running in
> the
> >>> >> > background.
> >>> >> > Kindly let me know if it is per design and how can I make my
> indexer
> >>> to
> >>> >> > wait until the optimization is over. Is there a
> >>> configuration/parameter I
> >>> >> > need to set for the same.
> >>> >> >
> >>> >> > Please note that the same indexer with
> >>> cloudSolrServer.optimize(true, true,
> >>> >> > 1) on Solr-4.10 used to wait till the optimize was over before
> >>> exiting.
> >>> >>
> >>> >> This is very odd, because I could not get HttpSolrServer to
> optimize in
> >>> >> the background, even when that was what I wanted.
> >>> >>
> >>> >> I wondered if maybe the Cloud object behaves differently with
> regard to
> >>> >> blocking until an optimize is finished ... except that there is no
> code
> >>> >> for optimizing in CloudSolrClient at all ... so I don't know where
> the
> >>> >> different behavior would actually be happening.
> >>> >
> >>> > A more important question is, why are you optimising? Generally it
> isn't
> >>> > recommended anymore as it reduces the natural distribution of
> documents
> >>> > amongst segments and makes future merges more costly.
> >>> >
> >>> > Upayavira
> >>>
> >>
> >>
>

Reply via email to