From the nodetool output you quoted, I seriously suspect your Cassandra
nodes have at least one of the following issues:
* Clock out of sync
* Bad network connectivity between nodes
* Long GC pauses
* Broken disks
* CPU bottleneck
It's not normal to see over 2% dropped small messages. It needs
investigation.
Adding node one by one is fine. If your data can grow at speed that's
close to or faster than the speed of adding nodes, you have a much more
serious problem.
We usually leave it running in the background with an automated process,
and we don't really care if it took 1 day or 5 days to complete, as long
as nodes are added correctly and successfully.
On 08/07/2022 10:55, Marc Hoppins wrote:
Ifconfig shows RX of 1.1T. This doesn't seem to fit with the LOAD of 145GiB
(nodetool status), unless I am reading that wrong...and the fact that this node
still has a status of UJ.
Netstats on this node shows (other than :
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed Dropped
Large messages n/a 0 0 0
Small messages n/a 53 569755545 15740262
Gossip messages n/a 0 288878 2
None of this addresses the issue of not being able to add more nodes.
-----Original Message-----
From: Bowen Song via user<user@cassandra.apache.org>
Sent: Friday, July 8, 2022 11:47 AM
To:user@cassandra.apache.org
Subject: Re: Adding nodes
EXTERNAL
I would assume that's 85 GB (i.e. gigabytes) then. Which is approximately 79
GiB (i.e. gibibytes). This still sounds awfully slow - less than 1MB/s over a
full day (24 hours).
You said CPU and network aren't the bottleneck. Have you checked the disk IO?
Also, be mindful with CPU usage. It can still be a bottleneck if one thread
uses 100% of a CPU core while all other cores are idle.
On 08/07/2022 07:09, Marc Hoppins wrote:
Thank you for pointing that out.
85 gigabytes/gibibytes/GIGABYTES/GIBIBYTES/whatever name you care to
give it
CPU and bandwidth are not the problem.
Version 4.0.3 but, as I stated, all nodes use the same version so the version
is not important either.
Existing nodes have 350-400+(choose whatever you want to call a
gigabyte)
The problem appears to be that adding new nodes is a serial process, which is
fine when there is no data and each node is added within 2minutes. It is
hardly practical in production.
-----Original Message-----
From: Bowen Song via user<user@cassandra.apache.org>
Sent: Thursday, July 7, 2022 8:43 PM
To:user@cassandra.apache.org
Subject: Re: Adding nodes
EXTERNAL
86Gb (that's gigabits, which is 10.75GB, gigabytes) took an entire day seems
obviously too long. I would check the network bandwidth, disk IO and CPU usage
and find out what is the bottleneck.
On 07/07/2022 15:48, Marc Hoppins wrote:
Hi all,
Cluster of 2 DC and 24 nodes
DC1 (RF3) = 12 nodes, 16 tokens each
DC2 (RF3) = 12 nodes, 16 tokens each
Adding 12 more nodes to DC1: I installed Cassandra (version is the same across
all nodes) but, after the first node added, I couldn't seem to add any further
nodes.
I check nodetool status and the newly added node is UJ. It remains this way all
day and only 86Gb of data is added to the node over the entire day (probably
not yet complete). This seems a little slow and, more than a little
inconvenient to only be able to add one node at a time - or at least one node
every 2 minutes. When the cluster was created, I timed each node from service
start to status UJ (having a UUID) and it was around 120 seconds. Of course
there was no data.
Is it possible I have some setting not correctly tuned?
Thanks
Marc