Re: solr 4.7.2 mergeFactor/ Merge policy issue

Summer Shire Fri, 06 Mar 2015 17:21:44 -0800

Hi All,

Here’s more update on where I am at with this.
I enabled infoStream logging and quickly figured that I need to get rid of 
maxBufferedDocs. So Erick you 
were absolutely right on that.
I increased my ramBufferSize to 100MB
and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
My config looks like this


<indexConfig>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>100</ramBufferSizeMB>

    
<!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnce">3</int>
      <int name="segmentsPerTier">3</int>
    </mergePolicy>
    <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
    <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
  </indexConfig>

I am attaching a sample infostream log file.
In the infoStream logs though you an see how the segments keep on adding
and it shows (just an example )
allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0

I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting 
calculated
// Compute max allowed segs in the index
    long levelSize = minSegmentBytes;
    long bytesLeft = totIndexBytes;
    double allowedSegCount = 0;
    while(true) {
      final double segCountLevel = bytesLeft / (double) levelSize;
      if (segCountLevel < segsPerTier) {
        allowedSegCount += Math.ceil(segCountLevel);
        break;
      }
      allowedSegCount += segsPerTier;
      bytesLeft -= segsPerTier * levelSize;
      levelSize *= maxMergeAtOnce;
    }
    int allowedSegCountInt = (int) allowedSegCount;
and the minSegmentBytes is calculated as follows
 // Compute total index bytes & print details about the index
    long totIndexBytes = 0;
    long minSegmentBytes = Long.MAX_VALUE;
    for(SegmentInfoPerCommit info : infosSorted) {
      final long segBytes = size(info);
      if (verbose()) {
        String extra = merging.contains(info) ? " [merging]" : "";
        if (segBytes >= maxMergedSegmentBytes/2.0) {
          extra += " [skip: too large]";
        } else if (segBytes < floorSegmentBytes) {
          extra += " [floored]";
        }
        message("  seg=" + writer.get().segString(info) + " size=" + 
String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
      }

      minSegmentBytes = Math.min(segBytes, minSegmentBytes);
      // Accum total byte size
      totIndexBytes += segBytes;
    }


any input is welcome.



thanks,
Summer


> On Mar 5, 2015, at 8:11 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> I would, BTW, either just get rid of the <maxBufferedDocs> all together or
> make it much higher, i.e. 100000. I don't think this is really your
> problem, but you're creating a lot of segments here.
> 
> But I'm kind of at a loss as to what would be different about your setup.
> Is there _any_ chance that you have some secondary process looking at
> your index that's maintaining open searchers? Any custom code that's
> perhaps failing to close searchers? Is this a Unix or Windows system?
> 
> And just to be really clear, you _only_ seeing more segments being
> added, right? If you're only counting files in the index directory, it's
> _possible_ that merging is happening, you're just seeing new files take
> the place of old ones.
> 
> Best,
> Erick
> 
> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>>> I _think_, but don't know for sure, that the merging stuff doesn't get
>>> triggered until you commit, it doesn't "just happen".
>>> 
>>> Shot in the dark...
>> 
>> I believe that new segments are created when the indexing buffer
>> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
>> anytime a new segment is created, the merge policy is checked to see
>> whether a merge is needed.
>> 
>> Thanks,
>> Shawn
>>

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Reply via email to