Hi Walter, I wonder why you think SolrCloud isn't necessary if you're indexing once per week. Isn't the automatic failover and auto-sharding still useful? One can also do custom sharding with SolrCloud if necessary.
On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood <wun...@wunderwood.org> wrote: > More memory or faster disks will make a much bigger improvement than a > forced merge. > > What are you measuring? If it is average query time, that is not a good > measure. Look at 90th or 95th percentile. Test with queries from logs. > > No user can see a 10% or 20% difference. If your managers are watching > that, they are watching the wrong thing. > > If you are indexing once per week, you don't really need the complexity of > Solr Cloud. You can do manual sharding. > > wunder > > On Jul 8, 2014, at 10:55 PM, Modassar Ather <modather1...@gmail.com> > wrote: > > > Our index has almost 100M documents running on SolrCloud of 3 shards and > > each shard has an index size of about 700GB (for the record, we are not > > using stored fields - our documents are pretty large). We perform a full > > indexing every weekend and during the week there are no updates made to > the > > index. Most of the queries that we run are pretty complex with hundreds > of > > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. > > and take many minutes to execute. A difference of 10-20% is also a big > > advantage for us. > > > > We have been optimizing the index after indexing for years and it has > > worked well for us. Every once in a while, we upgrade Solr to the latest > > version and try without optimizing so that we can save the many hours it > > take to optimize such a huge index, but it does not work well. > > > > Kindly provide your suggestion. > > > > Thanks, > > Modassar > > > > > > On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood <wun...@wunderwood.org > > > > wrote: > > > >> I seriously doubt that you are required to force merge. > >> > >> How much improvement? And is the big performance cost also OK? > >> > >> I have worked on search engines that do automatic merges and offer > forced > >> merges for over fifteen years. For all that time, forced merges have > >> usually caused problems. > >> > >> Stop doing forced merges. > >> > >> wunder > >> > >> On Jul 8, 2014, at 10:09 PM, Modassar Ather <modather1...@gmail.com> > >> wrote: > >> > >>> Thanks Walter for your inputs. > >>> > >>> Our use case and performance benchmark requires us to invoke optimize. > >>> > >>> Here we see a chance of improvement in performance of optimize() if > >> invoked > >>> in parallel. > >>> I found that if* distrib=false *is used, the optimization will happen > in > >>> parallel. > >>> > >>> But I could not find a way to set it using > >> HttpSolrServer/CloudSolrServer. > >>> Also with the parameter setting as given in my mail above does not > seems > >> to > >>> work. > >>> > >>> Please let me know in what ways I can achieve the parallel optimize on > >>> SolrCloud. > >>> > >>> Thanks, > >>> Modassar > >>> > >>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood < > wun...@wunderwood.org> > >>> wrote: > >>> > >>>> You probably do not need to force merge (mistakenly called "optimize") > >>>> your index. > >>>> > >>>> Solr does automatic merges, which work just fine. > >>>> > >>>> There are only a few situations where a forced merge is even a good > >> idea. > >>>> The most common one is a replicated (non-cloud) setup with a full > >> reindex > >>>> every night. > >>>> > >>>> If you need Solr Cloud, I cannot think of a situation where you would > >> want > >>>> a forced merge. > >>>> > >>>> wunder > >>>> > >>>> On Jul 8, 2014, at 2:01 AM, Modassar Ather <modather1...@gmail.com> > >> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> Need to optimize index created using CloudSolrServer APIs under > >> SolrCloud > >>>>> setup of 3 instances on separate machines. Currently it optimizes > >>>>> sequentially if I invoke cloudSolrServer.optimize(). > >>>>> > >>>>> To make it parallel I tried making three separate HttpSolrServer > >>>> instances > >>>>> and invoked httpSolrServer.opimize() on them parallely but still it > >> seems > >>>>> to be doing optimization sequentially. > >>>>> > >>>>> I tried invoking optimize directly using HttpPost with following url > >> and > >>>>> parameters but still it seems to be sequential. > >>>>> *URL* : http://host:port/solr/collection/update > >>>>> > >>>>> *Parameters*: > >>>>> params.add(new BasicNameValuePair("optimize", "true")); > >>>>> params.add(new BasicNameValuePair("maxSegments", "1")); > >>>>> params.add(new BasicNameValuePair("waitFlush", "true")); > >>>>> params.add(new BasicNameValuePair("distrib", "false")); > >>>>> > >>>>> Kindly provide your suggestion and help. > >>>>> > >>>>> Regards, > >>>>> Modassar > >>>> > >>>> > >>>> > >>>> > >>>> > >> > >> -- > >> Walter Underwood > >> wun...@wunderwood.org > >> > >> > >> > >> > > -- > Walter Underwood > wun...@wunderwood.org > > > > -- Regards, Shalin Shekhar Mangar.