Hi Rubén, Good luck with your adoptive Cassandra cluster :-).
Some thoughts: You are struggling with balancing your cluster, there is a lot of documentation about how the Cassandra Architecture work. Some insight to help you understand what your setup is while reading. - You are *not* using vnodes (1 token range per node) - You are using Random Partitioner. About your current issue, I'll use an OODA loop as the content is quite long, it helps me structuring my ideas, improve completeness, and it will hopefully make it easier to read on your side. It's up to you to follow this plan or not (Decide / Act) parts. Observe When using 5 nodes, *no* vnodes and random partitioner the token (initial_token in cassandra.yaml) in use should be: - 0 - 34028236692093846346337460743176821145 - 68056473384187692692674921486353642290 - 102084710076281539039012382229530463435 - 136112946768375385385349842972707284580 For 6 nodes: - 0 - 28356863910078205288614550619314017621 *- 56713727820156410577229101238628035242* - 85070591730234615865843651857942052863 - 113427455640312821154458202477256070484 - 141784319550391026443072753096570088105 Using http://www.geroba.com/cassandra/cassandra-token-calculator/, but there are many tools like this one, just pick one :-). Orient So it looks like your setup is built to run with 6 nodes and as of now only have 5. Your cluster is then *not* well balanced. This cluster where a node (60.206) was added recently and an other node has been removed recently, probably sitting on Token '56713727820156410577229101238628035242'. When removing this node the balance was broken. 10.128.60.106 datacenter1 rack1 Up Normal 262.12 GB 33.33% > 85070591730234615865843651857942052863 This node is holding a percentage of the data twice bigger than initially planned. There is a replication factor of 3, so 2 other nodes are having more data than expected as well. This cluster needs to be balanced to avoid making 3 of those nodes being more loaded (size and throughput) than the other 2. Cassandra can live with it, not sure about your server. If using the same hardware everywhere, it makes sense to try balancing it. Other notes: - Using 'nodetool status <keyspace>' instead of 'nodetool ring' or 'nodetool status' is a good habit you might want to take as RF and ownership is defined at the Keyspace level. I believe this has no impacts in the current issue. - As you added a new node recently, some old nodes might be still holding data they are not owning anymore. To address this you need to run a nodetool cleanup on *all* the nodes *excepted* the last one you added. This latter point combined with the imbalances would explain that the new node owns a different amount of data. Decide So I tend to agree with you that some data should be cleaned: Is that difference data that is not cleaned up, such as TTL-expired cell or > tombstoned data? But I would say that nodes are holding data out of their primary and replicated ranges. We need to get rid of this extra data that is no longer used anyway. We also want to balance the cluster if using the same hardware on all the nodes. Act Here is what I would probably do First fix the balance, this means you need to either: - "move" (nodetool move) nodes around in order to have a well balanced 5 node cluster (you might want to read more about it if going this way). *OR* - add a 6th node with 'initial_token: 56713727820156410577229101238628035242' As 'nodetool move' is quite an heavy operation involving a few nodes and a lot of streaming, to be performed on each node except one, I would recommend adding a node, even more if you are still learning about Cassandra. Once you're happy with the cluster balance, run "nodetool cleanup" on all the nodes. It's a local operation that can be run simultaneously on many / all the nodes, as long as there are resources available, as it is a bit I/O and CPU intensive. Then check again the balance (nodetool status <keyspace>). Due to the compacton state you can have discrepancies but it should be far better. C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-10-04 15:57 GMT+02:00 Ruben Cardenal <cassan...@ruben.cn>: > Hi, > > We've inherited quite a big amazon infrastructure from a company we've > purchased. It's has an ancient and obsolete implementation of services, > being the worst (and more expensive) of all of them a 5 cluster of > Cassandra (RF=3). I'm new to Cassandra, and yes, I'm doing my way > throughout docs. > > I was told that Amazon asked them a few months ago to reboot one of their > servers (it had been turned on for so long that Amazon had to make some > changes and needed it rebooted), so they had to add a new node to the > cluster. If you query nodetool as of now, it shows: > > $ nodetool ring > Note: Ownership information does not include topology, please specify a > keyspace. > Address DC Rack Status State Load Owns Token > 141784319550391026443072753096570088105 > 10.128.50.130 datacenter1 rack1 Up Normal 263.06 GB 16.67% 0 > 10.128.50.237 datacenter1 rack1 Up Normal 253.31 GB 16.67% > 28356863910078205288614550619314017621 > 10.128.60.106 datacenter1 rack1 Up Normal 262.12 GB 33.33% > 85070591730234615865843651857942052863 > 10.128.70.41 datacenter1 rack1 Up Normal 264.28 GB 16.67% > 113427455640312821154458202477256070484 > 10.128.60.206 datacenter1 rack1 Up Normal 65.15 GB 16.67% > 141784319550391026443072753096570088105 > > What puzzels me is the last line. It belongs to the last added node, the > new one I talked about. While it's holding the same amount of data (16.67%) > that other 3 nodes, the Load is about 4 times lower. What does this mean? > Is that difference data that is not cleaned up, such as TTL-expired cell or > tombstoned data? > > Thanks and excuse me if I'm asking something stupid. > > Rubén. > > > >