We just migrated a Cassandra cluster on EC2 to another instance type. We replaced one server after another, this creates problems similar to what you describe.
We simply stop Cassandra, copy the complete data dir to an EBS volume, terminate the server, launch another server with the same IP, copy the data dir from the EBS volume and start Cassandra on the new server. Hinted handoff will write the updates that the replaced node has missed as long as you finish within the max_hint_window_in_ms duration. We also repaired the new node but this should not be necessary. 2013/12/5 Philippe Dupont <pdup...@teads.tv> > Hi, > We currently have a 28 node C* cluster on m1.XLarge instances using Vnodes > and are encountering a Raid issue with one of them. > > The first solution could be to decommission this node and insert a new one > in the cluster, since we use vnodes we need to run 28 cleanup after > adding a node, this value will increase as our cluster grow. > > In theory, I would like to duplicate the defective node into a new one and > switch them without impacting the cluster : that would avoid the > decommission and all the streaming on the old node which could then be > instantly removed. > > Is there any way to do this? > > Thanks, > > Philippe >