Re: Node removal causes spike in pending native-transport requests and clients suffer

Gil Ganz Fri, 12 Mar 2021 23:37:17 -0800

Hey Bowen
I agree it's better to have smaller servers in general, this is the smaller
servers version :)
In this case, I wouldn't say the data model is bad, and we certainly do our
best to tune everything so less hardware is needed.
It's just that the data and amount of requests/s is very big to begin with,
multiple datacenters around the world (on-prem), with each datacenter
having close to 100 servers.
Making the servers smaller would mean a very large cluster, which has other
implications when it's on-prem.



On Fri, Mar 12, 2021 at 1:30 AM Bowen Song <bo...@bso.ng.invalid> wrote:

> May I ask why do you scale your Cassandra cluster vertically instead of
> horizontally as recommended?
>
> I'm asking because I had dealt with a vertically scaled cluster before. It
> was because they had query performance issue and blamed the hardware wasn't
> strong enough. Scaling vertically had helped them to improve the query
> performance, but it turned out the root caused was bad data modelling, and
> it gradually got worse with the ever increasing data size. Eventually they
> reached the roof of what money can realistically buy - 256GB RAM and 16
> cores 3.x GHz CPU per server in their case.
>
> Is that your case too? Bigger RAM, more cores and higher CPU frequency to
> help "fix" the performance issue? I really hope not.
>
>
> On 11/03/2021 09:57, Gil Ganz wrote:
>
> Yes. 192gb.
>
> On Thu, Mar 11, 2021 at 10:29 AM Kane Wilson <k...@raft.so> <k...@raft.so>
> wrote:
>
>> That is a very large heap.  I presume you are using G1GC? How much memory
>> do your servers have?
>>
>> raft.so - Cassandra consulting, support, managed services
>>
>> On Thu., 11 Mar. 2021, 18:29 Gil Ganz, <gilg...@gmail.com> wrote:
>>
>>> I always prefer to do decommission, but the issue here  is these servers
>>> are on-prem, and disks die from time to time.
>>> It's a very large cluster, in multiple datacenters around the world, so
>>> it can take some time before we have a replacement, so we usually need to
>>> run removenode in such cases.
>>>
>>> Other than that there are no issues in the cluster, the load is
>>> reasonable, and when this issue happens, following a removenode, this huge
>>> number of NTR is what I see, weird thing it's only on some nodes.
>>> I have been running with a very small
>>> native_transport_max_concurrent_requests_in_bytes  setting for a few days
>>> now on some nodes (few mb's compared to the default 0.8 of a 60gb heap), it
>>> looks like it's good enough for the app, will roll it out to the entire dc
>>> and test removal again.
>>>
>>>
>>> On Tue, Mar 9, 2021 at 10:51 AM Kane Wilson <k...@raft.so> <k...@raft.so>
>>> wrote:
>>>
>>>> It's unlikely to help in this case, but you should be using nodetool
>>>> decommission on the node you want to remove rather than removenode from
>>>> another node (and definitely don't force removal)
>>>>
>>>> native_transport_max_concurrent_requests_in_bytes defaults to 10% of
>>>> the heap, which I suppose depending on your configuration could potentially
>>>> result in a smaller number of concurrent requests than previously. It's
>>>> worth a shot setting it higher to see if the issue is related. Is this the
>>>> only issue you see on the cluster? I assume load on the cluster is still
>>>> low/reasonable and the only symptom you're seeing is the increased NTR
>>>> requests?
>>>>
>>>> raft.so - Cassandra consulting, support, and managed services
>>>>
>>>>
>>>> On Mon, Mar 8, 2021 at 10:47 PM Gil Ganz <gilg...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hey,
>>>>> We have a 3.11.9 cluster (recently upgraded from 2.1.14), and after
>>>>> the upgrade we have an issue when we remove a node.
>>>>>
>>>>> The moment I run the removenode command, 3 servers in the same dc
>>>>> start to have a high amount of pending native-transport-requests (getting
>>>>> to around 1M) and clients are having issues due to that. We are using
>>>>> vnodes (32), so I I don't see why I would have 3 servers busier than 
>>>>> others
>>>>> (RF is 3 but I don't see why it will be related).
>>>>>
>>>>> Each node has a few TB of data, and in the past we were able to remove
>>>>> a node in ~half a day, today what happens is in the first 1-2 hours we 
>>>>> have
>>>>> these issues with some nodes, then things go quite, remove is still 
>>>>> running
>>>>> and clients are ok, a few hours later the same issue is back (with same
>>>>> nodes as the problematic ones), and clients have issues again, leading us
>>>>> to run removenode force.
>>>>>
>>>>> Reducing stream throughput and number of compactors has helped
>>>>> to mitigate the issues a bit, but we still have this issue of pending
>>>>> native-transport requests getting to insane numbers and clients suffering,
>>>>> eventually causing us to run remove force. Any idea?
>>>>>
>>>>> I saw since 3.11.6 there is a parameter
>>>>> native_transport_max_concurrent_requests_in_bytes, looking into setting
>>>>> this, perhaps this will prevent the amount of pending tasks to get so 
>>>>> high.
>>>>>
>>>>> Gil
>>>>>
>>>>

Re: Node removal causes spike in pending native-transport requests and clients suffer

Reply via email to