Jonathon, thank you for your answers here. To explain this bit ...
On 11 March 2011 20:46, Jonathan Ellis <jbel...@gmail.com> wrote: > On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <j...@visualdna.com> wrote: >> Copying a cluster between AWS DC's: >> We have ~ 150-250GB per node, with a Replication Factor of 4. >> I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to >> minimise that outage period I was wondering if it's possible to >> drain & stop the cluster, then copy over only the 1st, 5th, 9th, >> and 13th nodes' worth of data (which should be a full copy of >> all our actual data - we are nicely partitioned, despite the >> disparity in GB per node) and have Cassandra re-populate the >> new destination 16 nodes from those four data sets. If this is >> feasible, is it likely to be more expensive (in terms of time the >> new cluster is unresponsive as it rebuilds) than just copying >> across all 16 sets of data - about 2.7TB. > > I'm confused. You're trying to upgrade and add a DC at the same time? Yeah, I know, it's probably not the sanest route - but the hardware (virtualised, Amazonish EC2 that it is) will be the same between the two sites, so that reduces some of the usual roll in / roll out migration risk. But more importantly for us it would mean we'd have just the one major outage, rather than two (relocation and 0.6 -> 0.7) cheers, Jedd.