[ 
https://issues.apache.org/jira/browse/CASSANDRA-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597160#comment-16597160
 ] 

mck edited comment on CASSANDRA-14679 at 8/30/18 7:54 AM:
----------------------------------------------------------

{quote}If the operator misconfigures the node by removing a directory from 
data_file_directories, I don't really think Cassandra can tell whether it was 
intentional or unintentional.
{quote}
If a node starts up with data but without {{system.local}} I think it's safe to 
say this is not intentional, and we should prevent new tokens from being 
generated. But I haven't looked through the code to see what's cheap to patch.
{quote}I think the best would be to store a small file alongside cassandra.yaml 
on the node to remember state information.
{quote}
For a better solution I'm left wondering… maybe something through gossip that 
can reject the range movements? Something along the lines of "the other nodes 
know you already and are not (automatically) letting you change your Host ID".


was (Author: michaelsembwever):
{quote}If the operator misconfigures the node by removing a directory from 
data_file_directories, I don't really think Cassandra can tell whether it was 
intentional or unintentional.{quote}

If a node starts up with data but without {{system.local}} I think it's safe to 
say this is not intentional and to halt the process, and should prevent new 
tokens from being generated. But I haven't looked through the code to see 
what's cheap to patch.

{quote}I think the best would be to store a small file alongside cassandra.yaml 
on the node to remember state information.{quote}

For a better solution I'm left wondering… maybe something through gossip that 
can reject the range movements? Something along the lines of "the other nodes 
know you already and are not (automatically) letting you change your Host ID".

 

> Prevent generating new tokens on a node when data exists
> --------------------------------------------------------
>
>                 Key: CASSANDRA-14679
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14679
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: mck
>            Priority: Critical
>
> Data loss is possible if a node starts up without {{system.local}} data 
> available.
> If a node restarts and its {{system.local}} data is unavailable it will 
> generate new tokens. This will cause range movements in the cluster causing 
> potential data loss, as these range movements are not part of a 
> bootstrap/decommission and leaves orphaned data around the cluster.
> This can happen if a node restarts without a JBOD entry available, or if the 
> cassandra.yaml changes and leaves a JBOD entry out.
> If a node starts up, finds data but not its {{system.local}} it should not 
> generate new tokens. Neither should it assign itself a new Host ID.
> This is described in more detail in 
> http://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to