Thanks, Oleksandr,
In my case I'll need to replace all nodes in the cluster (one-by-one), so 
streaming will introduce perceptible overhead.
My question is not about data movement/copy itself, but more about all this 
token magic.

Okay, let's say we stopped old node, moved data to new node.
Once it's started with auto_bootstrap=false it will be added to the cluster 
like an usual node, just skipping streaming stage, right?

For a cluster with vnodes enabled, during addition of new node its token ranges 
are calculated automatically by C* on startup.

So, how will C* know that this new node must be responsible for exactly the 
same token ranges as the old node was?
How would the rest of nodes in the cluster ('peers') figure out that old node 
should be replaced in ring by the new one?

Do you know about some  limitation for this process in case of C* 2.1.x with 
vnodes enabled?

Regards,
Kyrill

________________________________
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Sent: Friday, February 2, 2018 4:26:30 PM
To: User
Subject: Re: Cassandra 2.1: replace running node without streaming

On Fri, Feb 2, 2018 at 3:15 PM, Kyrylo Lebediev 
<kyrylo_lebed...@epam.com<mailto:kyrylo_lebed...@epam.com>> wrote:

Hello All!

I've got a pretty standard task - to replace a running C* node [version 2.1.15, 
vnodes=256, Ec2Snitch] (IP address will change after replacement, have no 
control over it).

There are 2 ways stated in C* documentation how this can be done:

1) Add a new node, than 'nodetool decommission' [ = 2 data streaming + 2 token 
range recalculations],

2) Stop the node then replace it by setting -Dcassandra.replace_address [ = 1 
data streaming]
Unfortunately, both these methods imply data streaming.

Is there a supported solution how to replace a live healthy node without data 
streaming / bootstrapping?
Something like: "Stop old node, copy data to new node, start new node with 
auto_bootstrap=false etc..."

On EC2, if you're using EBS it's pretty easy: drain and stop the old node, 
attach the volume to the new one and start it.
If not using EBS, then you have to copy the data to the new node before it is 
started.


I was able to find a couple manuals on the Internet, like this one: 
http://engineering.mydrivesolutions.com/posts/cassandra_nodes_replacement/, but 
not having understanding of C* internals, I don't know if such hacks are safe.

More or less like that: rsync while the old node is still running, then stop 
the node and rsync again.

But given all the hassle, streaming with replace_address doesn't sound too 
costly to me.

Cheers,
--
Alex

Reply via email to