Hi All, Here’s more update on where I am at with this. I enabled infoStream logging and quickly figured that I need to get rid of maxBufferedDocs. So Erick you were absolutely right on that. I increased my ramBufferSize to 100MB and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well. My config looks like this
<indexConfig> <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>100</ramBufferSizeMB> <!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>--> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">3</int> <int name="segmentsPerTier">3</int> </mergePolicy> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/> <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream> </indexConfig> I am attaching a sample infostream log file. In the infoStream logs though you an see how the segments keep on adding and it shows (just an example ) allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0 I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting calculated // Compute max allowed segs in the index long levelSize = minSegmentBytes; long bytesLeft = totIndexBytes; double allowedSegCount = 0; while(true) { final double segCountLevel = bytesLeft / (double) levelSize; if (segCountLevel < segsPerTier) { allowedSegCount += Math.ceil(segCountLevel); break; } allowedSegCount += segsPerTier; bytesLeft -= segsPerTier * levelSize; levelSize *= maxMergeAtOnce; } int allowedSegCountInt = (int) allowedSegCount; and the minSegmentBytes is calculated as follows // Compute total index bytes & print details about the index long totIndexBytes = 0; long minSegmentBytes = Long.MAX_VALUE; for(SegmentInfoPerCommit info : infosSorted) { final long segBytes = size(info); if (verbose()) { String extra = merging.contains(info) ? " [merging]" : ""; if (segBytes >= maxMergedSegmentBytes/2.0) { extra += " [skip: too large]"; } else if (segBytes < floorSegmentBytes) { extra += " [floored]"; } message(" seg=" + writer.get().segString(info) + " size=" + String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra); } minSegmentBytes = Math.min(segBytes, minSegmentBytes); // Accum total byte size totIndexBytes += segBytes; } any input is welcome.
thanks, Summer > On Mar 5, 2015, at 8:11 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > I would, BTW, either just get rid of the <maxBufferedDocs> all together or > make it much higher, i.e. 100000. I don't think this is really your > problem, but you're creating a lot of segments here. > > But I'm kind of at a loss as to what would be different about your setup. > Is there _any_ chance that you have some secondary process looking at > your index that's maintaining open searchers? Any custom code that's > perhaps failing to close searchers? Is this a Unix or Windows system? > > And just to be really clear, you _only_ seeing more segments being > added, right? If you're only counting files in the index directory, it's > _possible_ that merging is happening, you're just seeing new files take > the place of old ones. > > Best, > Erick > > On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <apa...@elyograg.org> wrote: >> On 3/4/2015 4:12 PM, Erick Erickson wrote: >>> I _think_, but don't know for sure, that the merging stuff doesn't get >>> triggered until you commit, it doesn't "just happen". >>> >>> Shot in the dark... >> >> I believe that new segments are created when the indexing buffer >> (ramBufferSizeMB) fills up, even without commits. I'm pretty sure that >> anytime a new segment is created, the merge policy is checked to see >> whether a merge is needed. >> >> Thanks, >> Shawn >>