It's unlikely to help in this case, but you should be using nodetool decommission on the node you want to remove rather than removenode from another node (and definitely don't force removal)
native_transport_max_concurrent_requests_in_bytes defaults to 10% of the heap, which I suppose depending on your configuration could potentially result in a smaller number of concurrent requests than previously. It's worth a shot setting it higher to see if the issue is related. Is this the only issue you see on the cluster? I assume load on the cluster is still low/reasonable and the only symptom you're seeing is the increased NTR requests? raft.so - Cassandra consulting, support, and managed services On Mon, Mar 8, 2021 at 10:47 PM Gil Ganz <gilg...@gmail.com> wrote: > > Hey, > We have a 3.11.9 cluster (recently upgraded from 2.1.14), and after the > upgrade we have an issue when we remove a node. > > The moment I run the removenode command, 3 servers in the same dc start to > have a high amount of pending native-transport-requests (getting to around > 1M) and clients are having issues due to that. We are using vnodes (32), so > I I don't see why I would have 3 servers busier than others (RF is 3 but I > don't see why it will be related). > > Each node has a few TB of data, and in the past we were able to remove a > node in ~half a day, today what happens is in the first 1-2 hours we have > these issues with some nodes, then things go quite, remove is still running > and clients are ok, a few hours later the same issue is back (with same > nodes as the problematic ones), and clients have issues again, leading us > to run removenode force. > > Reducing stream throughput and number of compactors has helped to mitigate > the issues a bit, but we still have this issue of pending native-transport > requests getting to insane numbers and clients suffering, eventually > causing us to run remove force. Any idea? > > I saw since 3.11.6 there is a parameter > native_transport_max_concurrent_requests_in_bytes, looking into setting > this, perhaps this will prevent the amount of pending tasks to get so high. > > Gil >