Optimizing is much less important query-speed wise than historically, essentially it's not recommended much any more.
A significant effect of optimize _used_ to be purging obsolete data (i.e. that from deleted docs) from the index, but that is now done on merge. There's no harm in optimizing on off-peak hours, and combined with an appropriate merge policy that may make indexing a little better (I'm thinking of not doing as many massive merges here). BTW, in 4.0, there's DocumentWriterPerThread that merges in the background and pretty much removes even this as a motivation for optimizing. All that said, optimizing isn't _bad_, it's just often unnecessary. Best Erick On Wed, May 2, 2012 at 9:29 AM, Prakashganesh, Prabhu <prabhu.prakashgan...@dowjones.com> wrote: > Actually we are not thinking of a M/S setup > We are planning to have x number of shards on N number of servers, each of > the shard handling both indexing and searching > The expected query volume is not that high, so don't think we would need to > replicate to slaves. We think each shard will be able to handle its share of > the indexing and searching. If we need to scale query capacity in future, > yeah probably need to do it by replicating each shard to its slaves > > I agree autoCommit settings would be good to set up appropriately > > Another question I had is pros/cons of optimising the index. We would be > purging old content every week and am thinking whether to run an index > optimise in the weekend after purging old data. Because we are going to be > continuously indexing data which would be mix of adds, updates, deletes, not > sure if the benefit of optimising would last long enough to be worth doing > it. Maybe setting a low mergeFactor would be good enough. Optimising makes > sense if the index is more static, perhaps? Thoughts? > > Thanks > Prabhu > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 02 May 2012 13:15 > To: solr-user@lucene.apache.org > Subject: Re: Solr Merge during off peak times > > But again, with a master/slave setup merging should > be relatively benign. And at 200M docs, having a M/S > setup is probably indicated. > > Here's a good writeup of mergepolicy > http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ > > If you're indexing and searching on a single machine, merging > is much less important than how often you commit. If a M/S > situation, then you're polling interval on the slave is important. > > I'd look at commit frequency long before I worried about merging, > that's usually where people shoot themselves in the foot - by > committing too often. > > Overall, your mergeFactor is probably less important than other > parts of how you perform indexing/searching, but it does have > some effect for sure... > > Best > Erick > > On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu > <prabhu.prakashgan...@dowjones.com> wrote: >> We have a fairly large scale system - about 200 million docs and fairly high >> indexing activity - about 300k docs per day with peak ingestion rates of >> about 20 docs per sec. I want to work out what a good mergeFactor setting >> would be by testing with different mergeFactor settings. I think the default >> of 10 might be high, I want to try with 5 and compare. Unless I know when a >> merge starts and finishes, it would be quite difficult to work out the >> impact of changing mergeFactor. I want to be able to measure how long merges >> take, run queries during the merge activity and see what the response times >> are etc.. >> >> Thanks >> Prabhu >> >> -----Original Message----- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: 02 May 2012 12:40 >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Merge during off peak times >> >> Why do you care? Merging is generally a background process, or are >> you doing heavy indexing? In a master/slave setup, >> it's usually not really relevant except that (with 3.x), massive merges >> may temporarily stop indexing. Is that the problem? >> >> Look at the merge policys, there are configurations that make >> this less painful. >> >> In trunk, DocumentWriterPerThread makes merges happen in the >> background, which helps the long-pause-while-indexing problem. >> >> Best >> Erick >> >> On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu >> <prabhu.prakashgan...@dowjones.com> wrote: >>> Ok, thanks Otis >>> Another question on merging >>> What is the best way to monitor merging? >>> Is there something in the log file that I can look for? >>> It seems like I have to monitor the system resources - read/write IOPS >>> etc.. and work out when a merge happened >>> It would be great if I can do it by looking at log files or in the admin >>> UI. Do you know if this can be done or if there is some tool for this? >>> >>> Thanks >>> Prabhu >>> >>> -----Original Message----- >>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>> Sent: 01 May 2012 15:12 >>> To: solr-user@lucene.apache.org >>> Subject: Re: Solr Merge during off peak times >>> >>> Hi Prabhu, >>> >>> I don't think such a merge policy exists, but it would be nice to have this >>> option and I imagine it wouldn't be hard to write if you really just base >>> the merge or no merge decision on the time of day (and maybe day of the >>> week). >>> >>> Note that this should go into Lucene, not Solr, so if you decide to >>> contribute your work, please >>> see http://wiki.apache.org/lucene-java/HowToContribute >>> >>> Otis >>> ---- >>> Performance Monitoring for Solr - http://sematext.com/spm >>> >>> >>> >>> >>>>________________________________ >>>> From: "Prakashganesh, Prabhu" <prabhu.prakashgan...@dowjones.com> >>>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >>>>Sent: Tuesday, May 1, 2012 8:45 AM >>>>Subject: Solr Merge during off peak times >>>> >>>>Hi, >>>> I would like to know if there is a way to configure index merge policy in >>>>solr so that the merging happens during off peak hours. Can you please let >>>>me know if such a merge policy configuration exists? >>>> >>>>Thanks >>>>Prabhu >>>> >>>> >>>>