Vincent White created CASSANDRA-14559:
-----------------------------------------

             Summary: Check for endpoint collision with hibernating nodes 
                 Key: CASSANDRA-14559
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14559
             Project: Cassandra
          Issue Type: Bug
            Reporter: Vincent White


I ran across an edge case when replacing a node with the same address. This 
issue results in the node(and its tokens) being unsafely removed from gossip.

Steps to replicate:

1. Create 3 node cluster.
2. Stop a node
3. Replace the stopped node with a node using the same address using the 
replace_address flag
4. Stop the node before it finishes bootstrapping
5. Remove the replace_address flag and restart the node to resume bootstrapping 
(if the data dir is also cleared at this point the node will also generate new 
tokens when it starts)
6. Stop the node before it finishes bootstrapping again
7. 30 Seconds later the node will be removed from gossip because it now matches 
the check for a FatClient

I think this is only an issue when replacing a node with the same address 
because other replacements now use STATUS_BOOTSTRAPPING_REPLACE and leave the 
dead node unchanged.

I believe the simplest fix for this is to add a check that prevents a 
non-bootstrapped node (without the replaces_address flag) starting if there is 
a gossip entry for the same address in the hibernate state. 

[3.11 PoC 
|https://github.com/apache/cassandra/compare/trunk...vincewhite:check_for_hibernate_on_start]


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to