Cluster Maintenance Mishap

Branton Davis Thu, 20 Oct 2016 14:03:36 -0700

Howdy folks.  I asked some about this in IRC yesterday, but we're looking
to hopefully confirm a couple of things for our sanity.


Yesterday, I was performing an operation on a 21-node cluster (vnodes,
replication factor 3, NetworkTopologyStrategy, and the nodes are balanced
across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB
volume (where all cassandra data, including the commitlog, is stored) with
a 2TB volume.  The plan for each node (one at a time) was basically:

   - rsync while the node is live (repeated until there were only minor
   differences from new data)
   - stop cassandra on the node
   - rsync again
   - replace the old volume with the new
   - start cassandra

However, there was a bug in the rsync command.  Instead of copying the
contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to
/var/data/cassandra_new/cassandra.  So, when cassandra was started after
the volume swap, there was some behavior that was similar to bootstrapping
a new node (data started streaming in from other nodes).  But there
was also some behavior that was similar to a node replacement (nodetool
status showed the same IP address, but a different host ID).  This happened
with 3 nodes (one from each AZ).  The nodes had received 1.4GB, 1.2GB, and
0.6GB of data (whereas the normal load for a node is around 500-600GB).

The cluster was in this state for about 2 hours, at which point cassandra
was stopped on them.  Later, I moved the data from the original volumes
back into place (so, should be the original state before the operation) and
started cassandra back up.

Finally, the questions.  We've accepted the potential loss of new data
within the two hours, but our primary concern now is what was happening
with the bootstrapping nodes.  Would they have taken on the token ranges of
the original nodes or acted like new nodes and got new token ranges?  If
the latter, is it possible that any data moved from the healthy nodes to
the "new" nodes or would restarting them with the original data (and
repairing) put the cluster's token ranges back into a normal state?

Hopefully that was all clear.  Thanks in advance for any info!

Cluster Maintenance Mishap

Reply via email to