Our index has almost 100M documents running on SolrCloud of 3 shards and
each shard has an index size of about 700GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but it does not work well.

Kindly provide your suggestion.

Thanks,
Modassar


On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> I seriously doubt that you are required to force merge.
>
> How much improvement? And is the big performance cost also OK?
>
> I have worked on search engines that do automatic merges and offer forced
> merges for over fifteen years. For all that time, forced merges have
> usually caused problems.
>
> Stop doing forced merges.
>
> wunder
>
> On Jul 8, 2014, at 10:09 PM, Modassar Ather <modather1...@gmail.com>
> wrote:
>
> > Thanks Walter for your inputs.
> >
> > Our use case and performance benchmark requires us to invoke optimize.
> >
> > Here we see a chance of improvement in performance of optimize() if
> invoked
> > in parallel.
> > I found that if* distrib=false *is used, the optimization will happen in
> > parallel.
> >
> > But I could not find a way to set it using
> HttpSolrServer/CloudSolrServer.
> > Also with the parameter setting as given in my mail above does not seems
> to
> > work.
> >
> > Please let me know in what ways I can achieve the parallel optimize on
> > SolrCloud.
> >
> > Thanks,
> > Modassar
> >
> > On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood <wun...@wunderwood.org>
> > wrote:
> >
> >> You probably do not need to force merge (mistakenly called "optimize")
> >> your index.
> >>
> >> Solr does automatic merges, which work just fine.
> >>
> >> There are only a few situations where a forced merge is even a good
> idea.
> >> The most common one is a replicated (non-cloud) setup with a full
> reindex
> >> every night.
> >>
> >> If you need Solr Cloud, I cannot think of a situation where you would
> want
> >> a forced merge.
> >>
> >> wunder
> >>
> >> On Jul 8, 2014, at 2:01 AM, Modassar Ather <modather1...@gmail.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Need to optimize index created using CloudSolrServer APIs under
> SolrCloud
> >>> setup of 3 instances on separate machines. Currently it optimizes
> >>> sequentially if I invoke cloudSolrServer.optimize().
> >>>
> >>> To make it parallel I tried making three separate HttpSolrServer
> >> instances
> >>> and invoked httpSolrServer.opimize() on them parallely but still it
> seems
> >>> to be doing optimization sequentially.
> >>>
> >>> I tried invoking optimize directly using HttpPost with following url
> and
> >>> parameters but still it seems to be sequential.
> >>> *URL* : http://host:port/solr/collection/update
> >>>
> >>> *Parameters*:
> >>> params.add(new BasicNameValuePair("optimize", "true"));
> >>> params.add(new BasicNameValuePair("maxSegments", "1"));
> >>> params.add(new BasicNameValuePair("waitFlush", "true"));
> >>> params.add(new BasicNameValuePair("distrib", "false"));
> >>>
> >>> Kindly provide your suggestion and help.
> >>>
> >>> Regards,
> >>> Modassar
> >>
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Reply via email to