Hello all, I have a question about moving part of a multi-datacenter cluster to a new physical datacenter. For example, suppose I have a two-datacenter cluster with one DC in San Jose, California and one DC in Orlando, Florida, and I want to move all the nodes in Orlando to a new datacenter in Tampa.
The standard procedure for doing this seems to be add a 3rd datacenter to the cluster, stream data to the new datacenter via nodetool rebuild, then decommission the old datacenter. A more detailed review of this procedure can be found here: http://thelastpickle.com/blog/2019/02/26/data-center-switch.html However, I see two problems with the above protocol. First, it requires changes on the application layer because of the datacenter name change; e.g. all applications referring to the datacenter ‘Orlando’ will now have to be changed to refer to ‘Tampa’. Second, it requires that a full repair be run on every node in the old datacenter, ensuring that all writes which went to it are replicated to the new datacenter, before decommissioning it. This repair (for a large dataset) can be prohibitively expensive. As such, I was wondering what peoples’ thoughts were on the following alternative procedure: 1) Kill one node in the old datacenter 2) Add a new node in the new datacenter but indicate that it is to REPLACE the one just shutdown; this node will bootstrap, and all the data which it is supposed to be responsible for will be streamed to it 3) Repeat steps one and two until all nodes have been replaced In particular, I’m curious if anybody has any insight on what problems can arise if a “logical” datacenter in Cassandra actually spans two different physical datacenters, and whether these problems might be mitigated if the two physical datacenters in question are geographically close together (e.g. Tampa and Orlando). Thanks, -Saleil