Hi Rubén,

Good luck with your adoptive Cassandra cluster :-).

Some thoughts:

You are struggling with balancing your cluster, there is a lot of
documentation about how the Cassandra Architecture work.

Some insight to help you understand what your setup is while reading.

- You are *not* using vnodes (1 token range per node)
- You are using Random Partitioner.

About your current issue, I'll use an OODA loop as the content is quite
long, it helps me structuring my ideas, improve completeness, and it will
hopefully make it easier to read on your side. It's up to you to follow
this plan or not (Decide / Act) parts.

Observe

When using 5 nodes, *no* vnodes and random partitioner the token
(initial_token in cassandra.yaml) in use should be:

- 0
- 34028236692093846346337460743176821145
- 68056473384187692692674921486353642290
- 102084710076281539039012382229530463435
- 136112946768375385385349842972707284580

For 6 nodes:

- 0
- 28356863910078205288614550619314017621
*- 56713727820156410577229101238628035242*
- 85070591730234615865843651857942052863
- 113427455640312821154458202477256070484
- 141784319550391026443072753096570088105

Using http://www.geroba.com/cassandra/cassandra-token-calculator/, but
there are many tools like this one, just pick one :-).

Orient

So it looks like your setup is built to run with 6 nodes and as of now only
have 5.

Your cluster is then *not* well balanced. This cluster where a node
(60.206) was added recently and an other node has been removed recently,
probably sitting on Token '56713727820156410577229101238628035242'. When
removing this node the balance was broken.

10.128.60.106 datacenter1 rack1 Up Normal 262.12 GB 33.33%
> 85070591730234615865843651857942052863


This node is holding a percentage of the data twice bigger than initially
planned. There is a replication factor of 3, so 2 other nodes are having
more data than expected as well. This cluster needs to be balanced to avoid
making 3 of those nodes being more loaded (size and throughput) than the
other 2. Cassandra can live with it, not sure about your server. If using
the same hardware everywhere, it makes sense to try balancing it.

Other notes:

- Using 'nodetool status <keyspace>' instead of 'nodetool ring' or
'nodetool status' is a good habit you might want to take as RF and
ownership is defined at the Keyspace level. I believe this has no impacts
in the current issue.

- As you added a new node recently, some old nodes might be still holding
data they are not owning anymore. To address this you need to run a
nodetool cleanup on *all* the nodes *excepted* the last one you added.

This latter point combined with the imbalances would explain that the new
node owns a different amount of data.

Decide

So I tend to agree with you that some data should be cleaned:

Is that difference data that is not cleaned up, such as TTL-expired cell or
> tombstoned data?


But I would say that nodes are holding data out of their primary and
replicated ranges. We need to get rid of this extra data that is no longer
used anyway.

We also want to balance the cluster if using the same hardware on all the
nodes.

Act

Here is what I would probably do

First fix the balance, this means you need to either:

- "move" (nodetool move) nodes around in order to have a well balanced 5
node cluster (you might want to read more about it if going this way).

*OR*

- add a 6th node with
'initial_token: 56713727820156410577229101238628035242'

As 'nodetool move' is quite an heavy operation involving a few nodes and a
lot of streaming, to be performed on each node except one, I would
recommend adding a node, even more if you are still learning about
Cassandra.

Once you're happy with the cluster balance, run "nodetool cleanup" on all
the nodes. It's a local operation that can be run simultaneously on many /
all the nodes, as long as there are resources available, as it is a bit I/O
and CPU intensive.

Then check again the balance (nodetool status <keyspace>). Due to the
compacton state you can have discrepancies but it should be far better.

C*heers,

-----------------------
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-10-04 15:57 GMT+02:00 Ruben Cardenal <cassan...@ruben.cn>:

> Hi,
>
> We've inherited quite a big amazon infrastructure from a company we've
> purchased. It's has an ancient and obsolete implementation of services,
> being the worst (and more expensive) of all of them a 5 cluster of
> Cassandra (RF=3). I'm new to Cassandra, and yes, I'm doing my way
> throughout docs.
>
> I was told that Amazon asked them a few months ago to reboot one of their
> servers (it had been turned on for so long that Amazon had to make some
> changes and needed it rebooted), so they had to add a new node to the
> cluster. If you query nodetool as of now, it shows:
>
> $ nodetool ring
> Note: Ownership information does not include topology, please specify a
> keyspace.
> Address DC Rack Status State Load Owns Token
> 141784319550391026443072753096570088105
> 10.128.50.130 datacenter1 rack1 Up Normal 263.06 GB 16.67% 0
> 10.128.50.237 datacenter1 rack1 Up Normal 253.31 GB 16.67%
> 28356863910078205288614550619314017621
> 10.128.60.106 datacenter1 rack1 Up Normal 262.12 GB 33.33%
> 85070591730234615865843651857942052863
> 10.128.70.41 datacenter1 rack1 Up Normal 264.28 GB 16.67%
> 113427455640312821154458202477256070484
> 10.128.60.206 datacenter1 rack1 Up Normal 65.15 GB 16.67%
> 141784319550391026443072753096570088105
>
> What puzzels me is the last line. It belongs to the last added node, the
> new one I talked about. While it's holding the same amount of data (16.67%)
> that other 3 nodes, the Load is about 4 times lower. What does this mean?
> Is that difference data that is not cleaned up, such as TTL-expired cell or
> tombstoned data?
>
> Thanks and excuse me if I'm asking something stupid.
>
> Rubén.
>
>
>
>

Reply via email to