Hello all, 

I have a question about moving part of a multi-datacenter cluster to a new 
physical datacenter. 
For example, suppose I have a two-datacenter cluster with one DC in San Jose, 
California and one DC in Orlando, Florida, and I want to move all the nodes in 
Orlando to a new datacenter in Tampa.  


The standard procedure for doing this seems to be add a 3rd datacenter to the 
cluster, stream data to the new datacenter via nodetool rebuild, then 
decommission the old datacenter. A more detailed review of this procedure can 
be found here: 
http://thelastpickle.com/blog/2019/02/26/data-center-switch.html



However, I see two problems with the above protocol.  First, it requires 
changes on the application layer because of the datacenter name change; e.g. 
all applications referring to the datacenter ‘Orlando’ will now have to be 
changed to refer to ‘Tampa’.  Second, it requires that a full repair be run on 
every node in the old datacenter, ensuring that all writes which went to it are 
replicated to the new datacenter, before decommissioning it. This repair (for a 
large dataset) can be prohibitively expensive. 



As such, I was wondering what peoples’ thoughts were on the following 
alternative procedure: 

1) Kill one node in the old datacenter

2) Add a new node in the new datacenter but indicate that it is to REPLACE the 
one just shutdown; this node will bootstrap, and all the data which it is 
supposed to be responsible for will be streamed to it

3) Repeat steps one and two until all nodes have been replaced



In particular, I’m curious if anybody has any insight on what  problems can 
arise if a “logical” datacenter in Cassandra actually spans two different 
physical datacenters, and whether these problems might be mitigated if the two 
physical datacenters in question are geographically close together (e.g. Tampa 
and Orlando). 

Thanks, 
-Saleil 

Reply via email to