[ https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-8336: ---------------------------------------- Comment: was deleted (was: v3 addresses the previous issues. It turns out for the first problem, the simplest thing to do is not make shutdown a dead state, and instead special case detection of it in handleMajorStateChange at the very end.) > Quarantine nodes after receiving the gossip shutdown message > ------------------------------------------------------------ > > Key: CASSANDRA-8336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8336 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Brandon Williams > Assignee: Brandon Williams > Fix For: 2.0.13 > > Attachments: 8336-v2.txt, 8336.txt > > > In CASSANDRA-3936 we added a gossip shutdown announcement. The problem here > is that this isn't sufficient; you can still get TOEs and have to wait on the > FD to figure things out. This happens due to gossip propagation time and > variance; if node X shuts down and sends the message to Y, but Z has a > greater gossip version than Y for X and has not yet received the message, it > can initiate gossip with Y and thus mark X alive again. I propose > quarantining to solve this, however I feel it should be a -D parameter you > have to specify, so as not to destroy current dev and test practices, since > this will mean a node that shuts down will not be able to restart until the > quarantine expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)