Re: Bringing a dead node back up after fixing hardware issues

Eran Chinthaka Withana Mon, 23 Jul 2012 20:25:14 -0700

Thanks Brandon for the answer (and I didn't know driftx = Brandon Williams.
Thanks for your awesome support in Cassandra IRC)

Increasing CL is tricky for us for now, as our RF on that datacenter is 2
and CL is set to ONE. If we make the CL to be LOCAL_QUORUM, then, if a node
goes down we will have trouble. I will try to increase the RF to 3 in that
data center and set the CL to LOCAL_QUORUM if nothing works out.

About decommissioning, if the node goes down. There is no way of knowing
running that command on that node, right? IIUC, decommissioning should be
run on a node that needs to be decommissioned.

Coming back to the original question, without touching the CL, can we bring
back a dead node (after fixing it) and somehow tell Cassandra that the node
is backup and do not send read requests until it gets all the data?

Thanks,
Eran Chinthaka Withana

On Mon, Jul 23, 2012 at 6:48 PM, Brandon Williams <dri...@gmail.com> wrote:

> On Mon, Jul 23, 2012 at 6:26 PM, Eran Chinthaka Withana
> <eran.chinth...@gmail.com> wrote:
> > Method 1: I copied the data from all the nodes in that data center, into
> the
> > repaired node, and brought it back up. But because of the rate of updates
> > happening, the read misses started going up.
>
> That's not really a good method when you scale up and the amount of
> data in the cluster won't fit on a single machine.
>
> > Method 2: I issued a removetoken command for that node's token and let
> the
> > cluster stream the data into relevant nodes. At the end of this process,
> the
> > dead node was not showing up in the ring output. Then I brought the node
> > back up. I was expecting, Cassandra to first stream data into the new
> node
> > (which happens to be the dead node which was in the cluster earlier) and
> > once its done then make it serve reads. But, in the server log, I can
> see as
> > soon the node comes up, it started serving reads, creating a large
> number of
> > read misses.
>
> Removetoken is for dead nodes, so the node has no way of locally
> knowing it shouldn't be a cluster member any longer when it starts up.
>  Instead if you had decommissioned, it would have saved a flag to
> indicate it should bootstrap at the next startup.
>
> > So the question is, what is the best way to bring back a dead node (once
> its
> > hardware issues are fixed) without impacting read misses?
>
> Increase your consistency level.  Run a repair on the node once it's
> back up, unless the repair time took longer than gc_grace, in which
> case you need to removetoken it, delete all the data, and bootstrap it
> back in if you don't want anything deleted to resurrect.
>
> -Brandon
>

Re: Bringing a dead node back up after fixing hardware issues

Reply via email to