Ahhh, you're right. Shows what happens when I work from memory....

Thanks.
Erick

On Wed, May 2, 2012 at 4:26 PM, Jason Rutherglen
<jason.rutherg...@gmail.com> wrote:
>> BTW, in 4.0, there's DocumentWriterPerThread that
>> merges in the background
>
> It flushes without pausing, but does not perform merges.  Maybe you're
> thinking of ConcurrentMergeScheduler?
>
> On Wed, May 2, 2012 at 7:26 AM, Erick Erickson <erickerick...@gmail.com> 
> wrote:
>> Optimizing is much less important query-speed wise
>> than historically, essentially it's not recommended much
>> any more.
>>
>> A significant effect of optimize _used_ to be purging
>> obsolete data (i.e. that from deleted docs) from the
>> index, but that is now done on merge.
>>
>> There's no harm in optimizing on off-peak hours, and
>> combined with an appropriate merge policy that may make
>> indexing a little better (I'm thinking of not doing
>> as many massive merges here).
>>
>> BTW, in 4.0, there's DocumentWriterPerThread that
>> merges in the background and pretty much removes
>> even this as a motivation for optimizing.
>>
>> All that said, optimizing isn't _bad_, it's just often
>> unnecessary.
>>
>> Best
>> Erick
>>
>> On Wed, May 2, 2012 at 9:29 AM, Prakashganesh, Prabhu
>> <prabhu.prakashgan...@dowjones.com> wrote:
>>> Actually we are not thinking of a M/S setup
>>> We are planning to have x number of shards on N number of servers, each of 
>>> the shard handling both indexing and searching
>>> The expected query volume is not that high, so don't think we would need to 
>>> replicate to slaves. We think each shard will be able to handle its share 
>>> of the indexing and searching. If we need to scale query capacity in 
>>> future, yeah probably need to do it by replicating each shard to its slaves
>>>
>>> I agree autoCommit settings would be good to set up appropriately
>>>
>>> Another question I had is pros/cons of optimising the index. We would be 
>>> purging old content every week and am thinking whether to run an index 
>>> optimise in the weekend after purging old data. Because we are going to be 
>>> continuously indexing data which would be mix of adds, updates, deletes, 
>>> not sure if the benefit of optimising would last long enough to be worth 
>>> doing it. Maybe setting a low mergeFactor would be good enough. Optimising 
>>> makes sense if the index is more static, perhaps? Thoughts?
>>>
>>> Thanks
>>> Prabhu
>>>
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>> Sent: 02 May 2012 13:15
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr Merge during off peak times
>>>
>>> But again, with a master/slave setup merging should
>>> be relatively benign. And at 200M docs, having a M/S
>>> setup is probably indicated.
>>>
>>> Here's a good writeup of mergepolicy
>>> http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
>>>
>>> If you're indexing and searching on a single machine, merging
>>> is much less important than how often you commit. If a M/S
>>> situation, then you're polling interval on the slave is important.
>>>
>>> I'd look at commit frequency long before I worried about merging,
>>> that's usually where people shoot themselves in the foot - by
>>> committing too often.
>>>
>>> Overall, your mergeFactor is probably less important than other
>>> parts of how you perform indexing/searching, but it does have
>>> some effect for sure...
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
>>> <prabhu.prakashgan...@dowjones.com> wrote:
>>>> We have a fairly large scale system - about 200 million docs and fairly 
>>>> high indexing activity - about 300k docs per day with peak ingestion rates 
>>>> of about 20 docs per sec. I want to work out what a good mergeFactor 
>>>> setting would be by testing with different mergeFactor settings. I think 
>>>> the default of 10 might be high, I want to try with 5 and compare. Unless 
>>>> I know when a merge starts and finishes, it would be quite difficult to 
>>>> work out the impact of changing mergeFactor. I want to be able to measure 
>>>> how long merges take, run queries during the merge activity and see what 
>>>> the response times are etc..
>>>>
>>>> Thanks
>>>> Prabhu
>>>>
>>>> -----Original Message-----
>>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>>> Sent: 02 May 2012 12:40
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Solr Merge during off peak times
>>>>
>>>> Why do you care? Merging is generally a background process, or are
>>>> you doing heavy indexing? In a master/slave setup,
>>>> it's usually not really relevant except that (with 3.x), massive merges
>>>> may temporarily stop indexing. Is that the problem?
>>>>
>>>> Look at the merge policys, there are configurations that make
>>>> this less painful.
>>>>
>>>> In trunk, DocumentWriterPerThread makes merges happen in the
>>>> background, which helps the long-pause-while-indexing problem.
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
>>>> <prabhu.prakashgan...@dowjones.com> wrote:
>>>>> Ok, thanks Otis
>>>>> Another question on merging
>>>>> What is the best way to monitor merging?
>>>>> Is there something in the log file that I can look for?
>>>>> It seems like I have to monitor the system resources - read/write IOPS 
>>>>> etc.. and work out when a merge happened
>>>>> It would be great if I can do it by looking at log files or in the admin 
>>>>> UI. Do you know if this can be done or if there is some tool for this?
>>>>>
>>>>> Thanks
>>>>> Prabhu
>>>>>
>>>>> -----Original Message-----
>>>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>>>> Sent: 01 May 2012 15:12
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: Solr Merge during off peak times
>>>>>
>>>>> Hi Prabhu,
>>>>>
>>>>> I don't think such a merge policy exists, but it would be nice to have 
>>>>> this option and I imagine it wouldn't be hard to write if you really just 
>>>>> base the merge or no merge decision on the time of day (and maybe day of 
>>>>> the week).
>>>>>
>>>>> Note that this should go into Lucene, not Solr, so if you decide to 
>>>>> contribute your work, please 
>>>>> see http://wiki.apache.org/lucene-java/HowToContribute
>>>>>
>>>>> Otis
>>>>> ----
>>>>> Performance Monitoring for Solr - http://sematext.com/spm
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>________________________________
>>>>>> From: "Prakashganesh, Prabhu" <prabhu.prakashgan...@dowjones.com>
>>>>>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>>>>>>Sent: Tuesday, May 1, 2012 8:45 AM
>>>>>>Subject: Solr Merge during off peak times
>>>>>>
>>>>>>Hi,
>>>>>>  I would like to know if there is a way to configure index merge policy 
>>>>>>in solr so that the merging happens during off peak hours. Can you please 
>>>>>>let me know if such a merge policy configuration exists?
>>>>>>
>>>>>>Thanks
>>>>>>Prabhu
>>>>>>
>>>>>>
>>>>>>

Reply via email to