Ahhh, you're right. Shows what happens when I work from memory.... Thanks. Erick
On Wed, May 2, 2012 at 4:26 PM, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: >> BTW, in 4.0, there's DocumentWriterPerThread that >> merges in the background > > It flushes without pausing, but does not perform merges. Maybe you're > thinking of ConcurrentMergeScheduler? > > On Wed, May 2, 2012 at 7:26 AM, Erick Erickson <erickerick...@gmail.com> > wrote: >> Optimizing is much less important query-speed wise >> than historically, essentially it's not recommended much >> any more. >> >> A significant effect of optimize _used_ to be purging >> obsolete data (i.e. that from deleted docs) from the >> index, but that is now done on merge. >> >> There's no harm in optimizing on off-peak hours, and >> combined with an appropriate merge policy that may make >> indexing a little better (I'm thinking of not doing >> as many massive merges here). >> >> BTW, in 4.0, there's DocumentWriterPerThread that >> merges in the background and pretty much removes >> even this as a motivation for optimizing. >> >> All that said, optimizing isn't _bad_, it's just often >> unnecessary. >> >> Best >> Erick >> >> On Wed, May 2, 2012 at 9:29 AM, Prakashganesh, Prabhu >> <prabhu.prakashgan...@dowjones.com> wrote: >>> Actually we are not thinking of a M/S setup >>> We are planning to have x number of shards on N number of servers, each of >>> the shard handling both indexing and searching >>> The expected query volume is not that high, so don't think we would need to >>> replicate to slaves. We think each shard will be able to handle its share >>> of the indexing and searching. If we need to scale query capacity in >>> future, yeah probably need to do it by replicating each shard to its slaves >>> >>> I agree autoCommit settings would be good to set up appropriately >>> >>> Another question I had is pros/cons of optimising the index. We would be >>> purging old content every week and am thinking whether to run an index >>> optimise in the weekend after purging old data. Because we are going to be >>> continuously indexing data which would be mix of adds, updates, deletes, >>> not sure if the benefit of optimising would last long enough to be worth >>> doing it. Maybe setting a low mergeFactor would be good enough. Optimising >>> makes sense if the index is more static, perhaps? Thoughts? >>> >>> Thanks >>> Prabhu >>> >>> >>> -----Original Message----- >>> From: Erick Erickson [mailto:erickerick...@gmail.com] >>> Sent: 02 May 2012 13:15 >>> To: solr-user@lucene.apache.org >>> Subject: Re: Solr Merge during off peak times >>> >>> But again, with a master/slave setup merging should >>> be relatively benign. And at 200M docs, having a M/S >>> setup is probably indicated. >>> >>> Here's a good writeup of mergepolicy >>> http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ >>> >>> If you're indexing and searching on a single machine, merging >>> is much less important than how often you commit. If a M/S >>> situation, then you're polling interval on the slave is important. >>> >>> I'd look at commit frequency long before I worried about merging, >>> that's usually where people shoot themselves in the foot - by >>> committing too often. >>> >>> Overall, your mergeFactor is probably less important than other >>> parts of how you perform indexing/searching, but it does have >>> some effect for sure... >>> >>> Best >>> Erick >>> >>> On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu >>> <prabhu.prakashgan...@dowjones.com> wrote: >>>> We have a fairly large scale system - about 200 million docs and fairly >>>> high indexing activity - about 300k docs per day with peak ingestion rates >>>> of about 20 docs per sec. I want to work out what a good mergeFactor >>>> setting would be by testing with different mergeFactor settings. I think >>>> the default of 10 might be high, I want to try with 5 and compare. Unless >>>> I know when a merge starts and finishes, it would be quite difficult to >>>> work out the impact of changing mergeFactor. I want to be able to measure >>>> how long merges take, run queries during the merge activity and see what >>>> the response times are etc.. >>>> >>>> Thanks >>>> Prabhu >>>> >>>> -----Original Message----- >>>> From: Erick Erickson [mailto:erickerick...@gmail.com] >>>> Sent: 02 May 2012 12:40 >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Solr Merge during off peak times >>>> >>>> Why do you care? Merging is generally a background process, or are >>>> you doing heavy indexing? In a master/slave setup, >>>> it's usually not really relevant except that (with 3.x), massive merges >>>> may temporarily stop indexing. Is that the problem? >>>> >>>> Look at the merge policys, there are configurations that make >>>> this less painful. >>>> >>>> In trunk, DocumentWriterPerThread makes merges happen in the >>>> background, which helps the long-pause-while-indexing problem. >>>> >>>> Best >>>> Erick >>>> >>>> On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu >>>> <prabhu.prakashgan...@dowjones.com> wrote: >>>>> Ok, thanks Otis >>>>> Another question on merging >>>>> What is the best way to monitor merging? >>>>> Is there something in the log file that I can look for? >>>>> It seems like I have to monitor the system resources - read/write IOPS >>>>> etc.. and work out when a merge happened >>>>> It would be great if I can do it by looking at log files or in the admin >>>>> UI. Do you know if this can be done or if there is some tool for this? >>>>> >>>>> Thanks >>>>> Prabhu >>>>> >>>>> -----Original Message----- >>>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>>>> Sent: 01 May 2012 15:12 >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: Solr Merge during off peak times >>>>> >>>>> Hi Prabhu, >>>>> >>>>> I don't think such a merge policy exists, but it would be nice to have >>>>> this option and I imagine it wouldn't be hard to write if you really just >>>>> base the merge or no merge decision on the time of day (and maybe day of >>>>> the week). >>>>> >>>>> Note that this should go into Lucene, not Solr, so if you decide to >>>>> contribute your work, please >>>>> see http://wiki.apache.org/lucene-java/HowToContribute >>>>> >>>>> Otis >>>>> ---- >>>>> Performance Monitoring for Solr - http://sematext.com/spm >>>>> >>>>> >>>>> >>>>> >>>>>>________________________________ >>>>>> From: "Prakashganesh, Prabhu" <prabhu.prakashgan...@dowjones.com> >>>>>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >>>>>>Sent: Tuesday, May 1, 2012 8:45 AM >>>>>>Subject: Solr Merge during off peak times >>>>>> >>>>>>Hi, >>>>>> I would like to know if there is a way to configure index merge policy >>>>>>in solr so that the merging happens during off peak hours. Can you please >>>>>>let me know if such a merge policy configuration exists? >>>>>> >>>>>>Thanks >>>>>>Prabhu >>>>>> >>>>>> >>>>>>