Re: Parallel optimize of index on SolrCloud.

Shalin Shekhar Mangar Tue, 08 Jul 2014 23:19:32 -0700

Hi Walter,

I wonder why you think SolrCloud isn't necessary if you're indexing once
per week. Isn't the automatic failover and auto-sharding still useful? One
can also do custom sharding with SolrCloud if necessary.



On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> More memory or faster disks will make a much bigger improvement than a
> forced merge.
>
> What are you measuring? If it is average query time, that is not a good
> measure. Look at 90th or 95th percentile. Test with queries from logs.
>
> No user can see a 10% or 20% difference. If your managers are watching
> that, they are watching the wrong thing.
>
> If you are indexing once per week, you don't really need the complexity of
> Solr Cloud. You can do manual sharding.
>
> wunder
>
> On Jul 8, 2014, at 10:55 PM, Modassar Ather <modather1...@gmail.com>
> wrote:
>
> > Our index has almost 100M documents running on SolrCloud of 3 shards and
> > each shard has an index size of about 700GB (for the record, we are not
> > using stored fields - our documents are pretty large). We perform a full
> > indexing every weekend and during the week there are no updates made to
> the
> > index. Most of the queries that we run are pretty complex with hundreds
> of
> > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
> > and take many minutes to execute. A difference of 10-20% is also a big
> > advantage for us.
> >
> > We have been optimizing the index after indexing for years and it has
> > worked well for us. Every once in a while, we upgrade Solr to the latest
> > version and try without optimizing so that we can save the many hours it
> > take to optimize such a huge index, but it does not work well.
> >
> > Kindly provide your suggestion.
> >
> > Thanks,
> > Modassar
> >
> >
> > On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood <wun...@wunderwood.org
> >
> > wrote:
> >
> >> I seriously doubt that you are required to force merge.
> >>
> >> How much improvement? And is the big performance cost also OK?
> >>
> >> I have worked on search engines that do automatic merges and offer
> forced
> >> merges for over fifteen years. For all that time, forced merges have
> >> usually caused problems.
> >>
> >> Stop doing forced merges.
> >>
> >> wunder
> >>
> >> On Jul 8, 2014, at 10:09 PM, Modassar Ather <modather1...@gmail.com>
> >> wrote:
> >>
> >>> Thanks Walter for your inputs.
> >>>
> >>> Our use case and performance benchmark requires us to invoke optimize.
> >>>
> >>> Here we see a chance of improvement in performance of optimize() if
> >> invoked
> >>> in parallel.
> >>> I found that if* distrib=false *is used, the optimization will happen
> in
> >>> parallel.
> >>>
> >>> But I could not find a way to set it using
> >> HttpSolrServer/CloudSolrServer.
> >>> Also with the parameter setting as given in my mail above does not
> seems
> >> to
> >>> work.
> >>>
> >>> Please let me know in what ways I can achieve the parallel optimize on
> >>> SolrCloud.
> >>>
> >>> Thanks,
> >>> Modassar
> >>>
> >>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood <
> wun...@wunderwood.org>
> >>> wrote:
> >>>
> >>>> You probably do not need to force merge (mistakenly called "optimize")
> >>>> your index.
> >>>>
> >>>> Solr does automatic merges, which work just fine.
> >>>>
> >>>> There are only a few situations where a forced merge is even a good
> >> idea.
> >>>> The most common one is a replicated (non-cloud) setup with a full
> >> reindex
> >>>> every night.
> >>>>
> >>>> If you need Solr Cloud, I cannot think of a situation where you would
> >> want
> >>>> a forced merge.
> >>>>
> >>>> wunder
> >>>>
> >>>> On Jul 8, 2014, at 2:01 AM, Modassar Ather <modather1...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Need to optimize index created using CloudSolrServer APIs under
> >> SolrCloud
> >>>>> setup of 3 instances on separate machines. Currently it optimizes
> >>>>> sequentially if I invoke cloudSolrServer.optimize().
> >>>>>
> >>>>> To make it parallel I tried making three separate HttpSolrServer
> >>>> instances
> >>>>> and invoked httpSolrServer.opimize() on them parallely but still it
> >> seems
> >>>>> to be doing optimization sequentially.
> >>>>>
> >>>>> I tried invoking optimize directly using HttpPost with following url
> >> and
> >>>>> parameters but still it seems to be sequential.
> >>>>> *URL* : http://host:port/solr/collection/update
> >>>>>
> >>>>> *Parameters*:
> >>>>> params.add(new BasicNameValuePair("optimize", "true"));
> >>>>> params.add(new BasicNameValuePair("maxSegments", "1"));
> >>>>> params.add(new BasicNameValuePair("waitFlush", "true"));
> >>>>> params.add(new BasicNameValuePair("distrib", "false"));
> >>>>>
> >>>>> Kindly provide your suggestion and help.
> >>>>>
> >>>>> Regards,
> >>>>> Modassar
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >> --
> >> Walter Underwood
> >> wun...@wunderwood.org
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Parallel optimize of index on SolrCloud.

Reply via email to