We ran repair -pr on each node after we realized there was data loss and we added the 4 original nodes back in the cluster. I.e. we ran repair on the 8 node cluster that consisted of the 4 old and 4 new nodes, once we realized there was a problem.
We are using quorum reads and writes. One thing that I didn't mention, and I think may be the culprit after doing a lot or mailing list reading, is that when we brought the 4 new nodes into the cluster, they had themselves listed in the seeds list. I read yesterday that if a node has itself in the seeds list, then it won't bootstrap properly. -- C On Tue, Nov 26, 2013 at 8:14 AM, Janne Jalkanen <janne.jalka...@ecyrd.com>wrote: > > That sounds bad! Did you run repair at any stage? Which CL are you > reading with? > > /Janne > > On 25 Nov 2013, at 19:00, Christopher J. Bottaro < > cjbott...@academicworks.com> wrote: > > Hello, > > We recently experienced (pretty severe) data loss after moving our 4 node > Cassandra cluster from one EC2 availability zone to another. Our strategy > for doing so was as follows: > > - One at a time, bring up new nodes in the new availability zone and > have them join the cluster. > - One at a time, decommission the old nodes in the old availability > zone and turn them off (stop the Cassandra process). > > Everything seemed to work as expected. As we decommissioned each node, we > checked the logs for messages indicating "yes, this node is done > decommissioning" before turning the node off. > > Pretty quickly after the old nodes left the cluster, we started getting > client calls about data missing. > > We immediately turned the old nodes back on and when they rejoined the > cluster *most* of the reported missing data returned. For the rest of the > missing data, we had to spin up a new cluster from EBS snapshots and copy > it over. > > What did we do wrong? > > In hindsight, we noticed a few things which may be clues... > > - The new nodes had much lower load after joining the cluster than the > old ones (3-4 gb as opposed to 10 gb). > - We have EC2Snitch turned on, although we're using SimpleStrategy for > replication. > - The new nodes showed even ownership (via nodetool status) after > joining the cluster. > > Here's more info about our cluster... > > - Cassandra 1.2.10 > - Replication factor of 3 > - Vnodes with 256 tokens > - All tables made via CQL > - Data dirs on EBS (yes, we are aware of the performance implications) > > > Thanks for the help. > > >