RE: Problem Replacing a Dead Node

Jan Thu, 21 Apr 2016 19:14:24 -0700

Mir; 

You can take a node out of the cluster with nodetool decommission to a live 
node, or nodetool removetoken (to any other machine) to remove a dead one. 
This will assign the ranges the old node was responsible for to other nodes, 
and replicate the appropriate data there. If decommission is used, the data 
will stream from the decommissioned node. If removetoken is used, the data will 
stream from the remaining replicas.

Hope this helps
Jan/

--------------------------------------------
On Thu, 4/21/16, Anubhav Kale <anubhav.k...@microsoft.com> wrote:

 Subject: RE: Problem Replacing a Dead Node
 To: "user@cassandra.apache.org" <user@cassandra.apache.org>
 Date: Thursday, April 21, 2016, 6:34 PM

 #yiv5871637581
 #yiv5871637581 --

  _filtered #yiv5871637581 {panose-1:2 4 5 3 5 4 6 3 2 4;}
  _filtered #yiv5871637581 {font-family:Calibri;panose-1:2 15
 5 2 2 2 4 3 2 4;}
 #yiv5871637581  
 #yiv5871637581 p.yiv5871637581MsoNormal, #yiv5871637581
 li.yiv5871637581MsoNormal, #yiv5871637581
 div.yiv5871637581MsoNormal
        {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}
 #yiv5871637581 a:link, #yiv5871637581
 span.yiv5871637581MsoHyperlink
        {color:blue;text-decoration:underline;}
 #yiv5871637581 a:visited, #yiv5871637581
 span.yiv5871637581MsoHyperlinkFollowed
        {color:purple;text-decoration:underline;}
 #yiv5871637581 span.yiv5871637581EmailStyle17
        {color:#1F497D;}
 #yiv5871637581 .yiv5871637581MsoChpDefault
        {}
  _filtered #yiv5871637581 {margin:1.0in 1.0in 1.0in 1.0in;}
 #yiv5871637581 div.yiv5871637581WordSection1
        {}
 #yiv5871637581 

 Reusing the bootstrapping node
 could have caused this, but hard to tell. Since you have
 only 7 nodes, have you tried doing a few rolling restarts of
 all nodes
  to let gossip settle ? Also, the node is pingable from
 other nodes even though it says Unreachable below. Correct
 ? 

 Based on nodetool status, it
 appears the node has streamed all the data it needs, but it
 doesn’t think it has joined the ring yet. Does cqlsh work
 on that node
  ?  

 From: Mir Tanvir Hossain
 [mailto:mir.tanvir.hoss...@gmail.com]

 Sent: Thursday, April 21, 2016 11:51 AM

 To: user@cassandra.apache.org

 Subject: Re: Problem Replacing a Dead Node

 Here is a bit more detail
 of the whole situation. I am hoping someone can help me out
 here. 

 We have a seven node
 cluster. One the nodes started to have issues but it was
 running. We decided to add a new node, and remove the
 problematic node after the new node joins. However, the new
 node did not join the cluster even after three
  days. Hence, we decided to go with the replacement option.
 We shutdown the problematic node. After that, we stopped
 cassandra on the bootstraping node, deleted all the data,
 and restarted that node as the replacement node for the
 problematic node.  

 Since, we reused the
 bootstrapping node as the replacement node, I am wondering
 whether that is causing any issue. Any insights are
 appreciated.  

 This is the output of
 nodetool describecluster from the replacement node, and two
 other nodes. 

 mhossain@cassandra-24:~$
 nodetool describecluster 

 Cluster Information: 

 Name: App 

 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 

 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

 Schema versions: 

 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.4,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 

 mhossain@cassandra-13:~$
 nodetool describecluster 

 Cluster Information: 

 Name: App 

 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 

 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

 Schema versions: 

 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 

 UNREACHABLE: [10.0.7.91, 10.0.7.4] 

 mhossain@cassandra-09:~$
 nodetool describecluster 

 Cluster Information: 

 Name: App 

 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 

 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

 Schema versions: 

 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 

 UNREACHABLE: [10.0.7.91, 10.0.7.4] 

 cassandra-24 (10.0.7.4) is
 the replacement node. 10.0.7.91 is the ip address of the
 dead node. 

 -Mir  

 On Thu, Apr 21, 2016 at 10:02
 AM, Mir Tanvir Hossain <mir.tanvir.hoss...@gmail.com>
 wrote: 

 Hi, I am trying to replace a
 dead node with by following 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
  It's been 3 full days since the replacement node
 started, and the node is still not showing up as part of the
 cluster on OpsCenter. I was wondering whether the delay is
 due to the fact that I have a test keyspace with replication
 factor of one? If I delete
  that keyspace, would the new node successfully replace the
 dead node? Any general insight will be hugely
 appreciated.  

 Thanks, 

 Mir

RE: Problem Replacing a Dead Node

Reply via email to