We just migrated a Cassandra cluster on EC2 to another instance type. We
replaced one server after another, this creates problems similar to what
you describe.

We  simply stop Cassandra, copy the complete data dir to an EBS volume,
terminate the server, launch another server with the same IP, copy the data
dir from the EBS volume and start Cassandra on the new server.

Hinted handoff will write the updates that the replaced node has missed as
long as you finish within the max_hint_window_in_ms duration. We also
repaired the new node but this should not be necessary.




2013/12/5 Philippe Dupont <pdup...@teads.tv>

> Hi,
> We currently have a 28 node C* cluster on m1.XLarge instances using Vnodes
> and are encountering a Raid issue with one of them.
>
> The first solution could be to decommission this node and insert a new one
> in the cluster, since we use vnodes we need to run 28 cleanup after
> adding a node, this value will increase as our cluster grow.
>
> In theory, I would like to duplicate the defective node into a new one and
> switch them without impacting the cluster : that would avoid the
> decommission and all the streaming on the old node which could then be
> instantly removed.
>
> Is there any way to do this?
>
> Thanks,
>
> Philippe
>

Reply via email to