On Mon, Jul 23, 2012 at 6:26 PM, Eran Chinthaka Withana <eran.chinth...@gmail.com> wrote: > Method 1: I copied the data from all the nodes in that data center, into the > repaired node, and brought it back up. But because of the rate of updates > happening, the read misses started going up.
That's not really a good method when you scale up and the amount of data in the cluster won't fit on a single machine. > Method 2: I issued a removetoken command for that node's token and let the > cluster stream the data into relevant nodes. At the end of this process, the > dead node was not showing up in the ring output. Then I brought the node > back up. I was expecting, Cassandra to first stream data into the new node > (which happens to be the dead node which was in the cluster earlier) and > once its done then make it serve reads. But, in the server log, I can see as > soon the node comes up, it started serving reads, creating a large number of > read misses. Removetoken is for dead nodes, so the node has no way of locally knowing it shouldn't be a cluster member any longer when it starts up. Instead if you had decommissioned, it would have saved a flag to indicate it should bootstrap at the next startup. > So the question is, what is the best way to bring back a dead node (once its > hardware issues are fixed) without impacting read misses? Increase your consistency level. Run a repair on the node once it's back up, unless the repair time took longer than gc_grace, in which case you need to removetoken it, delete all the data, and bootstrap it back in if you don't want anything deleted to resurrect. -Brandon