On Mon, Jun 20, 2011 at 4:00 PM, Shawn Heisey <s...@elyograg.org> wrote:
> On 6/20/2011 12:31 PM, Michael McCandless wrote:
>>
>> Actually, TieredMP has two different params (different from the
>> previous default LogMP):
>>
>>   * segmentsPerTier controls how many segments you can tolerate in the
>> index (bigger number means more segments)
>>
>>   * maxMergeAtOnce says how many segments can be merged at a time for
>> "normal" (not optimize) merging
>>
>> For back-compat, mergeFactor maps to both of these, but it's better to
>> set them directly eg:
>>
>>     <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>>       <int name="maxMergeAtOnce">10</int>
>>       <int name="segmentsPerTier">20</int>
>>     </mergePolicy>
>>
>> (and then remove your mergeFactor setting under indexDefaults)
>>
>> You should always have maxMergeAtOnce<= segmentsPerTier else too much
>> merging will happen.
>>
>> If you set segmentsPerTier to 35 than this can easily exceed 70
>> segments, so your optimize will again need more than one merge.  Note
>> that if you make the maxMergeAtOnce/Explicit too large then 1) you
>> risk running out of file handles (if you don't use compound file), and
>> 2) merge performance likely gets worse as the OS is forced to splinter
>> its IO cache across more files (I suspect) and so more seeking will
>> happen.
>
> Thanks much for the information!
>
> I've set my server up so that the user running the index has a soft limit of
> 4096 files and a hard limit of 6144 files, and /proc/sys/fs/file-max is
> 48409, so I should be OK on file handles.  The index is almost twice as big
> as available memory, so I'm not really worried about the I/O cache.  I've
> sized my mergFactor and ramBufferSizeMB so that the individual merges during
> indexing happen entirely from the I/O cache, which is the point where I
> really care about it.  There's nothing I can do about the optimize without
> spending a LOT of money.
>
> I will remove mergeFactor, set maxMergeAtOnce and segmentsPerTier to 35, and
> maxMergeAtOnceExplicit to 70.  If I ever run into a situation where it gets
> beyond 70 segments at any one time, I've probably got bigger problems than
> the number of passes my optimize takes, so I'll think about it then. :)
>  Does that sound reasonable?

With segmentsPerTier at 35 you will easily cross 70 segs in the index...

If you want optimize to run in a single merge, I would lower
sementsPerTier and mergeAtOnce (maybe back to the 10 default), and set
your maxMergeAtOnceExplicit to 70 or higher...

Lower mergeAtOnce means merges run more frequently but for shorter
time, and, your searching should be faster (than 35/35) since there
are fewer segments to visit.

Mike McCandless

http://blog.mikemccandless.com

Reply via email to