Hey Bowen I agree it's better to have smaller servers in general, this is the smaller servers version :) In this case, I wouldn't say the data model is bad, and we certainly do our best to tune everything so less hardware is needed. It's just that the data and amount of requests/s is very big to begin with, multiple datacenters around the world (on-prem), with each datacenter having close to 100 servers. Making the servers smaller would mean a very large cluster, which has other implications when it's on-prem.
On Fri, Mar 12, 2021 at 1:30 AM Bowen Song <bo...@bso.ng.invalid> wrote: > May I ask why do you scale your Cassandra cluster vertically instead of > horizontally as recommended? > > I'm asking because I had dealt with a vertically scaled cluster before. It > was because they had query performance issue and blamed the hardware wasn't > strong enough. Scaling vertically had helped them to improve the query > performance, but it turned out the root caused was bad data modelling, and > it gradually got worse with the ever increasing data size. Eventually they > reached the roof of what money can realistically buy - 256GB RAM and 16 > cores 3.x GHz CPU per server in their case. > > Is that your case too? Bigger RAM, more cores and higher CPU frequency to > help "fix" the performance issue? I really hope not. > > > On 11/03/2021 09:57, Gil Ganz wrote: > > Yes. 192gb. > > On Thu, Mar 11, 2021 at 10:29 AM Kane Wilson <k...@raft.so> <k...@raft.so> > wrote: > >> That is a very large heap. I presume you are using G1GC? How much memory >> do your servers have? >> >> raft.so - Cassandra consulting, support, managed services >> >> On Thu., 11 Mar. 2021, 18:29 Gil Ganz, <gilg...@gmail.com> wrote: >> >>> I always prefer to do decommission, but the issue here is these servers >>> are on-prem, and disks die from time to time. >>> It's a very large cluster, in multiple datacenters around the world, so >>> it can take some time before we have a replacement, so we usually need to >>> run removenode in such cases. >>> >>> Other than that there are no issues in the cluster, the load is >>> reasonable, and when this issue happens, following a removenode, this huge >>> number of NTR is what I see, weird thing it's only on some nodes. >>> I have been running with a very small >>> native_transport_max_concurrent_requests_in_bytes setting for a few days >>> now on some nodes (few mb's compared to the default 0.8 of a 60gb heap), it >>> looks like it's good enough for the app, will roll it out to the entire dc >>> and test removal again. >>> >>> >>> On Tue, Mar 9, 2021 at 10:51 AM Kane Wilson <k...@raft.so> <k...@raft.so> >>> wrote: >>> >>>> It's unlikely to help in this case, but you should be using nodetool >>>> decommission on the node you want to remove rather than removenode from >>>> another node (and definitely don't force removal) >>>> >>>> native_transport_max_concurrent_requests_in_bytes defaults to 10% of >>>> the heap, which I suppose depending on your configuration could potentially >>>> result in a smaller number of concurrent requests than previously. It's >>>> worth a shot setting it higher to see if the issue is related. Is this the >>>> only issue you see on the cluster? I assume load on the cluster is still >>>> low/reasonable and the only symptom you're seeing is the increased NTR >>>> requests? >>>> >>>> raft.so - Cassandra consulting, support, and managed services >>>> >>>> >>>> On Mon, Mar 8, 2021 at 10:47 PM Gil Ganz <gilg...@gmail.com> wrote: >>>> >>>>> >>>>> Hey, >>>>> We have a 3.11.9 cluster (recently upgraded from 2.1.14), and after >>>>> the upgrade we have an issue when we remove a node. >>>>> >>>>> The moment I run the removenode command, 3 servers in the same dc >>>>> start to have a high amount of pending native-transport-requests (getting >>>>> to around 1M) and clients are having issues due to that. We are using >>>>> vnodes (32), so I I don't see why I would have 3 servers busier than >>>>> others >>>>> (RF is 3 but I don't see why it will be related). >>>>> >>>>> Each node has a few TB of data, and in the past we were able to remove >>>>> a node in ~half a day, today what happens is in the first 1-2 hours we >>>>> have >>>>> these issues with some nodes, then things go quite, remove is still >>>>> running >>>>> and clients are ok, a few hours later the same issue is back (with same >>>>> nodes as the problematic ones), and clients have issues again, leading us >>>>> to run removenode force. >>>>> >>>>> Reducing stream throughput and number of compactors has helped >>>>> to mitigate the issues a bit, but we still have this issue of pending >>>>> native-transport requests getting to insane numbers and clients suffering, >>>>> eventually causing us to run remove force. Any idea? >>>>> >>>>> I saw since 3.11.6 there is a parameter >>>>> native_transport_max_concurrent_requests_in_bytes, looking into setting >>>>> this, perhaps this will prevent the amount of pending tasks to get so >>>>> high. >>>>> >>>>> Gil >>>>> >>>>