Actually we are not thinking of a M/S setup
We are planning to have x number of shards on N number of servers, each of the 
shard handling both indexing and searching
The expected query volume is not that high, so don't think we would need to 
replicate to slaves. We think each shard will be able to handle its share of 
the indexing and searching. If we need to scale query capacity in future, yeah 
probably need to do it by replicating each shard to its slaves

I agree autoCommit settings would be good to set up appropriately

Another question I had is pros/cons of optimising the index. We would be 
purging old content every week and am thinking whether to run an index optimise 
in the weekend after purging old data. Because we are going to be continuously 
indexing data which would be mix of adds, updates, deletes, not sure if the 
benefit of optimising would last long enough to be worth doing it. Maybe 
setting a low mergeFactor would be good enough. Optimising makes sense if the 
index is more static, perhaps? Thoughts?

Thanks
Prabhu 


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 02 May 2012 13:15
To: solr-user@lucene.apache.org
Subject: Re: Solr Merge during off peak times

But again, with a master/slave setup merging should
be relatively benign. And at 200M docs, having a M/S
setup is probably indicated.

Here's a good writeup of mergepolicy
http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

If you're indexing and searching on a single machine, merging
is much less important than how often you commit. If a M/S
situation, then you're polling interval on the slave is important.

I'd look at commit frequency long before I worried about merging,
that's usually where people shoot themselves in the foot - by
committing too often.

Overall, your mergeFactor is probably less important than other
parts of how you perform indexing/searching, but it does have
some effect for sure...

Best
Erick

On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
<prabhu.prakashgan...@dowjones.com> wrote:
> We have a fairly large scale system - about 200 million docs and fairly high 
> indexing activity - about 300k docs per day with peak ingestion rates of 
> about 20 docs per sec. I want to work out what a good mergeFactor setting 
> would be by testing with different mergeFactor settings. I think the default 
> of 10 might be high, I want to try with 5 and compare. Unless I know when a 
> merge starts and finishes, it would be quite difficult to work out the impact 
> of changing mergeFactor. I want to be able to measure how long merges take, 
> run queries during the merge activity and see what the response times are 
> etc..
>
> Thanks
> Prabhu
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 02 May 2012 12:40
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Merge during off peak times
>
> Why do you care? Merging is generally a background process, or are
> you doing heavy indexing? In a master/slave setup,
> it's usually not really relevant except that (with 3.x), massive merges
> may temporarily stop indexing. Is that the problem?
>
> Look at the merge policys, there are configurations that make
> this less painful.
>
> In trunk, DocumentWriterPerThread makes merges happen in the
> background, which helps the long-pause-while-indexing problem.
>
> Best
> Erick
>
> On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
> <prabhu.prakashgan...@dowjones.com> wrote:
>> Ok, thanks Otis
>> Another question on merging
>> What is the best way to monitor merging?
>> Is there something in the log file that I can look for?
>> It seems like I have to monitor the system resources - read/write IOPS etc.. 
>> and work out when a merge happened
>> It would be great if I can do it by looking at log files or in the admin UI. 
>> Do you know if this can be done or if there is some tool for this?
>>
>> Thanks
>> Prabhu
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> Sent: 01 May 2012 15:12
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Merge during off peak times
>>
>> Hi Prabhu,
>>
>> I don't think such a merge policy exists, but it would be nice to have this 
>> option and I imagine it wouldn't be hard to write if you really just base 
>> the merge or no merge decision on the time of day (and maybe day of the 
>> week).
>>
>> Note that this should go into Lucene, not Solr, so if you decide to 
>> contribute your work, please 
>> see http://wiki.apache.org/lucene-java/HowToContribute
>>
>> Otis
>> ----
>> Performance Monitoring for Solr - http://sematext.com/spm
>>
>>
>>
>>
>>>________________________________
>>> From: "Prakashganesh, Prabhu" <prabhu.prakashgan...@dowjones.com>
>>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>>>Sent: Tuesday, May 1, 2012 8:45 AM
>>>Subject: Solr Merge during off peak times
>>>
>>>Hi,
>>>  I would like to know if there is a way to configure index merge policy in 
>>>solr so that the merging happens during off peak hours. Can you please let 
>>>me know if such a merge policy configuration exists?
>>>
>>>Thanks
>>>Prabhu
>>>
>>>
>>>

Reply via email to