But again, with a master/slave setup merging should be relatively benign. And at 200M docs, having a M/S setup is probably indicated.
Here's a good writeup of mergepolicy http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ If you're indexing and searching on a single machine, merging is much less important than how often you commit. If a M/S situation, then you're polling interval on the slave is important. I'd look at commit frequency long before I worried about merging, that's usually where people shoot themselves in the foot - by committing too often. Overall, your mergeFactor is probably less important than other parts of how you perform indexing/searching, but it does have some effect for sure... Best Erick On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu <prabhu.prakashgan...@dowjones.com> wrote: > We have a fairly large scale system - about 200 million docs and fairly high > indexing activity - about 300k docs per day with peak ingestion rates of > about 20 docs per sec. I want to work out what a good mergeFactor setting > would be by testing with different mergeFactor settings. I think the default > of 10 might be high, I want to try with 5 and compare. Unless I know when a > merge starts and finishes, it would be quite difficult to work out the impact > of changing mergeFactor. I want to be able to measure how long merges take, > run queries during the merge activity and see what the response times are > etc.. > > Thanks > Prabhu > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 02 May 2012 12:40 > To: solr-user@lucene.apache.org > Subject: Re: Solr Merge during off peak times > > Why do you care? Merging is generally a background process, or are > you doing heavy indexing? In a master/slave setup, > it's usually not really relevant except that (with 3.x), massive merges > may temporarily stop indexing. Is that the problem? > > Look at the merge policys, there are configurations that make > this less painful. > > In trunk, DocumentWriterPerThread makes merges happen in the > background, which helps the long-pause-while-indexing problem. > > Best > Erick > > On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu > <prabhu.prakashgan...@dowjones.com> wrote: >> Ok, thanks Otis >> Another question on merging >> What is the best way to monitor merging? >> Is there something in the log file that I can look for? >> It seems like I have to monitor the system resources - read/write IOPS etc.. >> and work out when a merge happened >> It would be great if I can do it by looking at log files or in the admin UI. >> Do you know if this can be done or if there is some tool for this? >> >> Thanks >> Prabhu >> >> -----Original Message----- >> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> Sent: 01 May 2012 15:12 >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Merge during off peak times >> >> Hi Prabhu, >> >> I don't think such a merge policy exists, but it would be nice to have this >> option and I imagine it wouldn't be hard to write if you really just base >> the merge or no merge decision on the time of day (and maybe day of the >> week). >> >> Note that this should go into Lucene, not Solr, so if you decide to >> contribute your work, please >> see http://wiki.apache.org/lucene-java/HowToContribute >> >> Otis >> ---- >> Performance Monitoring for Solr - http://sematext.com/spm >> >> >> >> >>>________________________________ >>> From: "Prakashganesh, Prabhu" <prabhu.prakashgan...@dowjones.com> >>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >>>Sent: Tuesday, May 1, 2012 8:45 AM >>>Subject: Solr Merge during off peak times >>> >>>Hi, >>> I would like to know if there is a way to configure index merge policy in >>>solr so that the merging happens during off peak hours. Can you please let >>>me know if such a merge policy configuration exists? >>> >>>Thanks >>>Prabhu >>> >>> >>>