failure node rejoin

Yuji Ito Sun, 16 Oct 2016 22:42:07 -0700

Hi all,

A failure node can rejoin a cluster.
On the node, all data in /var/lib/cassandra were deleted.
Is it normal?


I can reproduce it as below.

cluster:
- C* 2.2.7
- a cluster has node1, 2, 3
- node1 is a seed
- replication_factor: 3

how to:
1) stop C* process and delete all data in /var/lib/cassandra on node2
($sudo rm -rf /var/lib/cassandra/*)
2) stop C* process on node1 and node3
3) restart C* on node1
4) restart C* on node2

nodetool status after 4):
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID
                        Rack
DN  [node3 IP]  ?                 256          100.0%
 325553c6-3e05-41f6-a1f7-47436743816f  rack1
UN  [node2 IP]  7.76 MB      256          100.0%
 05bdb1d4-c39b-48f1-8248-911d61935925  rack1
UN  [node1 IP]  416.13 MB  256          100.0%
 a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1

If I restart C* on node 2 when C* on node1 and node3 are running (without
2), 3)), a runtime exception happens.
RuntimeException: "A node with address [node2 IP] already exists,
cancelling join..."

I'm not sure this causes data lost. All data can be read properly just
after this rejoin.
But some rows are lost when I kill&restart C* for destructive tests after
this rejoin.

Thanks.

failure node rejoin

Reply via email to