Re: Solr Commit Thread Blocked because of excessive number of merging threads

Erick Erickson Thu, 07 Sep 2017 17:04:11 -0700

Skimming and to add to what Shawn said about ramBufferSizeMB.

It's totally wasted space pretty much since you've set maxDocs to 10,000.
It doesn't matter how big ramBufferSizeMB is, when you reach 10,000 docs
indexed the buffer will be flushed and set back to zero.


And +1 to all of Shawn's comments about the changes you've made to
the merge policy. I'd set them all back to the defaults unless you have some
kind of proof that they're helping since there's ample cause for concern that
they're hurting instead.

Best,
Erick

On Thu, Sep 7, 2017 at 3:53 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 9/6/2017 11:54 PM, yasoobhaider wrote:
>> My team has tasked me with upgrading Solr from the version we are using
>> (5.4) to the latest stable version 6.6. I am stuck for a few days now on the
>> indexing part.
>>
>> So in total I'm indexing about 2.5million documents. The average document
>> size is ~5KB. I have 10 (PHP) workers which are running in parallel, hitting
>> Solr with ~1K docs/minute. (This sometimes goes up to ~3K docs/minute).
>>
>> System specifications:
>> RAM: 120G
>> Processors: 16
>>
>> Solr configuration:
>> Heap size: 80G
>
> That's an ENORMOUS heap.  Why is it that big? If the index only has 2.5
> million documents and reaches a size of 10GB, I cannot imagine that
> index ever needing a heap that big.  That's just asking for extreme (but
> perhaps infrequent) garbage collection pauses.  Assuming those numbers
> for all your index data are correct, I'd drop it to something like 4GB.
> If your queries are particularly complex, you might want to go to 8GB.
> Note that this is also going to require that you significantly reduce
> your ramBufferSizeMB value, which I already advised you to do on another
> thread.
>
>> ------------------------------------------------------------------------------------------------------------
>> solrconfig.xml: (Relevant parts; please let me know if there's anything else
>> you would like to look at)
>>
>> <autoCommit>
>>       <maxDocs>10000</maxDocs>
>>       <maxTime>3800000</maxTime>
>>       <openSearcher>true</openSearcher>
>> </autoCommit>
>>
>> <autoSoftCommit>
>>       <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>> </autoSoftCommit>
>>
>> <ramBufferSizeMB>5000</ramBufferSizeMB>
>> <maxBufferedDocs>10000</maxBufferedDocs>
>>
>> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
>>       <int name="maxMergeAtOnce">30</int>
>>       <int name="segmentsPerTier">30</int>
>> </mergePolicyFactory>
>>
>> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
>>       <int name="maxMergeCount">8</int>
>>       <int name="maxThreadCount">7</int>
>> </mergeScheduler>
>
> I've given you suggestions on how to change this part of the config.
> See the message that I sent earlier on another thread -- at 14:21 UTC
> today.  If you change those settings as I recommended, the merging is
> less likely to overwhelm your system.
>
>> ------------------------------------------------------------------------------------------------------------
>>
>> The main problem:
>>
>> When I start indexing everything is good until I reach about 2 million docs,
>> which takes ~10 hours. But then the  commitscheduler thread gets blocked. It
>> is stuck at doStall() in ConcurrentMergeScheduler(CMS). Looking at the logs
>> from InfoStream, I found "too many merges; stalling" message from the
>> commitscheduler thread, post which it gets stuck in the while loop forever.
>
> This means that there are more merges scheduled than you have allowed
> with maxMergeCount, so the thread that's doing the actual indexing is
> paused.
>
> Best guess is that you are overwhelming your disks with multiple merge
> threads, because you've set maxMergeThreads to 7.  In most situations
> that should be 1, so multiple merges are not running simultaneously.
> Instead, they will be run one at a time so that each one can complete
> faster.  You may have plenty of CPU power to run multiple threads, but
> when multiple threads are accessing data on one disk volume, the random
> access can cause serious problems with disk I/O performance.
>
>> I also increased by maxMergeAtOnce and segmentsPerTier from 10 to 20 and
>> then to 30, in hopes of having fewer merging threads to be running at once,
>> but that just results in more segments to be created (not sure why this
>> would happen). I also tried going the other way by reducing it to 5, but
>> that experiment failed quickly (commit thread blocked).
>
> When you increase the values in the mergePolicy, you are explicitly
> telling Lucene to allow more segments in the index at any given moment.
> These settings should not be tweaked unless you know for sure that you
> can benefit from changing them.  Higher values should result in less
> merging, but the size of each merge that DOES happen will be larger, so
> it will take longer.
>
>> I increased the ramBufferSizeMB to 5000MB so that there are fewer flushes
>> happening, so that fewer segments are created, so that fewer merges happen
>> (I haven't dug deep here, so please correct me if this is something I should
>> revert. Our current (5.x) config has this set at 324MB).
>
> With large ram buffers, commits are more likely to control how big each
> segment is and how frequently they are flushed.  Tests by Solr and
> Lucene developers have shown that increasing the buffer size beyond
> 128MB rarely offers any advantage, unless the documents are huge.  At
> 5KB, yours aren't huge.
>
>> The autoCommit and autoSoftCommit settings look good to me, as I've turned
>> of softCommits, and I am autoCommitting at 10000 docs (every 5-10 minutes),
>> which finishes smoothly, unless it gets stuck in the first problem described
>> above.
>
> Your autoCommit has openSearcher set to true.  Commits that open a new
> searcher are very expensive.  It should be set to false.  You can rely
> on autoSoftCommit to make documents visible, with a much longer maxTime
> than you use for autoCommit.
>
> With a schema that's typical and documents that are not enormous, Solr
> should be able to index at several thousand documents per second,
> especially if there are multiple threads or multiple processes sending
> documents.  A few thousand documents per minute should be far less than
> Solr can actually handle.
>
>> Questions:
>> 1a. Why is Lucene spawning so many merging threads?
>
> Because it has been told that it can do so.  Your maxMergeThreads
> setting is 7.
>
>> 1b. How can I make sure that there's always room for the Commit thread to go
>> through?
>
> Set things up so that there are less simultaneous merges scheduled than
> maxMergeCount.
>
>> 1c. Are all MergeThreads in runnable state at Treemap.getEntry() is normal?
>
> I do not know what thread states are normal.  It's not something I've
> ever really looked at.  That doesn't sound unusual, though.
>
>> 2a. Is merging slower in 6.x than 5.x?
>> 2b. What can I do to make it go faster?
>> 2c. Could disk IO throttling be an issue? If so, how can I resolve it? I
>> tried providing ioThrottle=false in solrconfig but that just throws an
>> error.
>
> Merging should not be inherently slower in 6.x.  Assuming that
> everything is configured well and there are sufficient resources
> available, I would expect it to get BETTER with a newer version.
>
> I am not aware of any default I/O throttling for merges.  Even if it's
> not throttled, it will not proceed at the full speed of your disk.  The
> merging is NOT just a simple data copy, there is a lot of data
> manipulation and rebuilding that has to happen.  It involves a lot of
> CPU time in addition to the reading and writing I/O.
>
> I believe that the biggest part of your issues are caused by having
> maxMergeThreads higher than 1.
>
> Thanks,
> Shawn
>

Re: Solr Commit Thread Blocked because of excessive number of merging threads

Reply via email to