Optimizing is much less important query-speed wise
than historically, essentially it's not recommended much
any more.

A significant effect of optimize _used_ to be purging
obsolete data (i.e. that from deleted docs) from the
index, but that is now done on merge.

There's no harm in optimizing on off-peak hours, and
combined with an appropriate merge policy that may make
indexing a little better (I'm thinking of not doing
as many massive merges here).

BTW, in 4.0, there's DocumentWriterPerThread that
merges in the background and pretty much removes
even this as a motivation for optimizing.

All that said, optimizing isn't _bad_, it's just often
unnecessary.

Best
Erick

On Wed, May 2, 2012 at 9:29 AM, Prakashganesh, Prabhu
<prabhu.prakashgan...@dowjones.com> wrote:
> Actually we are not thinking of a M/S setup
> We are planning to have x number of shards on N number of servers, each of 
> the shard handling both indexing and searching
> The expected query volume is not that high, so don't think we would need to 
> replicate to slaves. We think each shard will be able to handle its share of 
> the indexing and searching. If we need to scale query capacity in future, 
> yeah probably need to do it by replicating each shard to its slaves
>
> I agree autoCommit settings would be good to set up appropriately
>
> Another question I had is pros/cons of optimising the index. We would be 
> purging old content every week and am thinking whether to run an index 
> optimise in the weekend after purging old data. Because we are going to be 
> continuously indexing data which would be mix of adds, updates, deletes, not 
> sure if the benefit of optimising would last long enough to be worth doing 
> it. Maybe setting a low mergeFactor would be good enough. Optimising makes 
> sense if the index is more static, perhaps? Thoughts?
>
> Thanks
> Prabhu
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 02 May 2012 13:15
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Merge during off peak times
>
> But again, with a master/slave setup merging should
> be relatively benign. And at 200M docs, having a M/S
> setup is probably indicated.
>
> Here's a good writeup of mergepolicy
> http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
>
> If you're indexing and searching on a single machine, merging
> is much less important than how often you commit. If a M/S
> situation, then you're polling interval on the slave is important.
>
> I'd look at commit frequency long before I worried about merging,
> that's usually where people shoot themselves in the foot - by
> committing too often.
>
> Overall, your mergeFactor is probably less important than other
> parts of how you perform indexing/searching, but it does have
> some effect for sure...
>
> Best
> Erick
>
> On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
> <prabhu.prakashgan...@dowjones.com> wrote:
>> We have a fairly large scale system - about 200 million docs and fairly high 
>> indexing activity - about 300k docs per day with peak ingestion rates of 
>> about 20 docs per sec. I want to work out what a good mergeFactor setting 
>> would be by testing with different mergeFactor settings. I think the default 
>> of 10 might be high, I want to try with 5 and compare. Unless I know when a 
>> merge starts and finishes, it would be quite difficult to work out the 
>> impact of changing mergeFactor. I want to be able to measure how long merges 
>> take, run queries during the merge activity and see what the response times 
>> are etc..
>>
>> Thanks
>> Prabhu
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: 02 May 2012 12:40
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Merge during off peak times
>>
>> Why do you care? Merging is generally a background process, or are
>> you doing heavy indexing? In a master/slave setup,
>> it's usually not really relevant except that (with 3.x), massive merges
>> may temporarily stop indexing. Is that the problem?
>>
>> Look at the merge policys, there are configurations that make
>> this less painful.
>>
>> In trunk, DocumentWriterPerThread makes merges happen in the
>> background, which helps the long-pause-while-indexing problem.
>>
>> Best
>> Erick
>>
>> On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
>> <prabhu.prakashgan...@dowjones.com> wrote:
>>> Ok, thanks Otis
>>> Another question on merging
>>> What is the best way to monitor merging?
>>> Is there something in the log file that I can look for?
>>> It seems like I have to monitor the system resources - read/write IOPS 
>>> etc.. and work out when a merge happened
>>> It would be great if I can do it by looking at log files or in the admin 
>>> UI. Do you know if this can be done or if there is some tool for this?
>>>
>>> Thanks
>>> Prabhu
>>>
>>> -----Original Message-----
>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>> Sent: 01 May 2012 15:12
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr Merge during off peak times
>>>
>>> Hi Prabhu,
>>>
>>> I don't think such a merge policy exists, but it would be nice to have this 
>>> option and I imagine it wouldn't be hard to write if you really just base 
>>> the merge or no merge decision on the time of day (and maybe day of the 
>>> week).
>>>
>>> Note that this should go into Lucene, not Solr, so if you decide to 
>>> contribute your work, please 
>>> see http://wiki.apache.org/lucene-java/HowToContribute
>>>
>>> Otis
>>> ----
>>> Performance Monitoring for Solr - http://sematext.com/spm
>>>
>>>
>>>
>>>
>>>>________________________________
>>>> From: "Prakashganesh, Prabhu" <prabhu.prakashgan...@dowjones.com>
>>>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>>>>Sent: Tuesday, May 1, 2012 8:45 AM
>>>>Subject: Solr Merge during off peak times
>>>>
>>>>Hi,
>>>>  I would like to know if there is a way to configure index merge policy in 
>>>>solr so that the merging happens during off peak hours. Can you please let 
>>>>me know if such a merge policy configuration exists?
>>>>
>>>>Thanks
>>>>Prabhu
>>>>
>>>>
>>>>

Reply via email to