Re: Node removal causes spike in pending native-transport requests and clients suffer

Kane Wilson Tue, 09 Mar 2021 00:59:45 -0800

It's unlikely to help in this case, but you should be using nodetool
decommission on the node you want to remove rather than removenode from
another node (and definitely don't force removal)


native_transport_max_concurrent_requests_in_bytes defaults to 10% of the
heap, which I suppose depending on your configuration could potentially
result in a smaller number of concurrent requests than previously. It's
worth a shot setting it higher to see if the issue is related. Is this the
only issue you see on the cluster? I assume load on the cluster is still
low/reasonable and the only symptom you're seeing is the increased NTR
requests?

raft.so - Cassandra consulting, support, and managed services


On Mon, Mar 8, 2021 at 10:47 PM Gil Ganz <gilg...@gmail.com> wrote:

>
> Hey,
> We have a 3.11.9 cluster (recently upgraded from 2.1.14), and after the
> upgrade we have an issue when we remove a node.
>
> The moment I run the removenode command, 3 servers in the same dc start to
> have a high amount of pending native-transport-requests (getting to around
> 1M) and clients are having issues due to that. We are using vnodes (32), so
> I I don't see why I would have 3 servers busier than others (RF is 3 but I
> don't see why it will be related).
>
> Each node has a few TB of data, and in the past we were able to remove a
> node in ~half a day, today what happens is in the first 1-2 hours we have
> these issues with some nodes, then things go quite, remove is still running
> and clients are ok, a few hours later the same issue is back (with same
> nodes as the problematic ones), and clients have issues again, leading us
> to run removenode force.
>
> Reducing stream throughput and number of compactors has helped to mitigate
> the issues a bit, but we still have this issue of pending native-transport
> requests getting to insane numbers and clients suffering, eventually
> causing us to run remove force. Any idea?
>
> I saw since 3.11.6 there is a parameter
> native_transport_max_concurrent_requests_in_bytes, looking into setting
> this, perhaps this will prevent the amount of pending tasks to get so high.
>
> Gil
>

Re: Node removal causes spike in pending native-transport requests and clients suffer

Reply via email to