Hi Toke!

Thanks for answering. That's it: I talk about index corruption just to
prevent, not because I have already noticed it. During some tests in the
past I checked that a mergeFactor of 2 improves more than a little bit
search speed instead common merge factors such as 10, for example. Of
course index speed is penalized, but my production architecture is based on
task queues and workers that index into Solr, and I've developed a custom
SolrCluster module that it's a black box that acts as a single Solr server
from an outside point of view, but it balances into N Solr master servers
internally deciding where to index, checking Solr servers status (alive,
dead), executing sharding search queries, etc., so that point is
controlled: if I need more index speed I can add new Solr masters and/or
new worker modules to dequeue, process and execute index operations. My
principal worry was about optimizing at much as possible search speed
thanks to optimizing, mergeFactor tunning, caches setup, etc.

Thanks a lot!


2014-02-06 Toke Eskildsen <t...@statsbiblioteket.dk>:

> On Thu, 2014-02-06 at 10:22 +0100, Luis Cappa Banda wrote:
> > I knew some performance tips to improve search and I configured a very
> > low merge factor (<mergeFactor>2</mergeFactor>) to boost search
> > operations instead of indexation ones.
>
> That would give you a small search speed increase and a huge penalty on
> indexing speed (as it will perform large merges all the time) and
> replication speed (as all file data will be updated frequently instead
> of just a subset of them). Unless you are absolutely sure that you need
> the small search speed increase, you should set this to a higher number.
>
> > I haven't got a deep knowledge of internal Lucene behavior in this
> > case, but I thought that somehow an optimization operation may rebuild
> > the index checking and fixing corrupted segments,
>
> To my knowledge, there are not attempts to repair corrupted segments
> during merge. I hope you speak of corruption as a precaution and not
> because it is something that happens to your setup. If you have
> corrupted indexes at any time, you should investigate how that happens,
> instead of trying to repair them.
>
> > One last question: do you think that this kind of scenario where I
> > continuously index and replicate data will corrupt the index?
>
> Lucene is used in a lot of places with massive updates. Aside for
> JVM-related bugs, it has proven to be very stable under these
> conditions. So not, the indexing will not corrupt anything.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>


-- 
- Luis Cappa

Reply via email to