Thanks again guys, this has been a major blocker for us and I think we've
made some major progress with your advice.

We have gone ahead with Lerh's suggestion and the cluster is operating much
more smoothly while the new node compacts. We read at quorum, so in the
event that we don't make it within the hinted handoff window, at least
there won't be inconsistent data from reads.

Kurt - what we've been observing is that after the node finishes getting
data streamed to it from other nodes, it will go into state UN and only
then start the compactions, in this case it has about 130 pending. When
it's still joining we don't see an I/O bottleneck. I think the reason this
may be an issue for us is because our nodes generally are not OK since
they're constantly maxing out their disk throughput and have long queues,
which is why we're trying to increase capacity by both adding nodes and
switching to RAIDed disks. Under normal operating circumstances they're
pushed to their limits, so I think when the node gets backed up on
compactions it really is enough to tip over the cluster.

That's helpful to know regarding sstableofflinerelevel, in my dry run it
did appear that it would shuffle even more SSTables into L0.

On Mon, Sep 11, 2017 at 11:50 PM, kurt greaves <k...@instaclustr.com> wrote:

>
>> Kurt - We're on 3.7, and our approach was to try thorttling compaction
>> throughput as much as possible rather than the opposite. I had found some
>> resources that suggested unthrottling to let it get it over with, but
>> wasn't sure if this would really help in our situation since the I/O pipe
>> was already fully saturated.
>>
>
> You should unthrottle during bootstrap as the node won't receive read
> queries until it finishes streaming and joins the cluster. It seems
> unlikely that you'd be bottlenecked on I/O during the bootstrapping
> process. If you were, you'd certainly have bigger problems. The aim is to
> clear out the majority of compactions *before* the node joins and starts
> servicing reads. You might also want to increase concurrent_compactors.
> Typical advice is same as # CPU cores, but you might want to increase it
> for the bootstrapping period.
>
> sstableofflinerelevel could help but I wouldn't count on it. Usage is
> pretty straightforward but you may find that a lot of the existing SSTables
> in L0 just get put back in L0 anyways, which is where the main compaction
> backlog comes from. Plus you have to take the node offline which may not be
> ideal. In this case I would suggest the strategy Lerh suggested as being
> more viable.
>
> Regardless, if the rest of your nodes are OK (and you don't have RF1/using
> CL=ALL) Cassandra should pretty effectively route around the slow node so a
> single node backed up on compactions shouldn't be a big deal.
>

Reply via email to