Re: Problem Replacing a Dead Node

Mir Tanvir Hossain Thu, 21 Apr 2016 17:41:12 -0700

I will try a rolling restart to see whether that helps. The replacement
node is pingable from other cassandra nodes. I also was able to telnet to
the storage port (7000) of the replacement node as well from another node.
cqlsh doesn't work on the new node. When does gossip settle?


Is there anyway to force the node to join the ring?

-Mir

On Thu, Apr 21, 2016 at 4:34 PM, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Reusing the bootstrapping node could have caused this, but hard to tell.
> Since you have only 7 nodes, have you tried doing a few rolling restarts of
> all nodes to let gossip settle ? Also, the node is pingable from other
> nodes even though it says Unreachable below. Correct ?
>
>
>
> Based on nodetool status, it appears the node has streamed all the data it
> needs, but it doesn’t think it has joined the ring yet. Does cqlsh work on
> that node ?
>
>
>
> *From:* Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
> *Sent:* Thursday, April 21, 2016 11:51 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Problem Replacing a Dead Node
>
>
>
> Here is a bit more detail of the whole situation. I am hoping someone can
> help me out here.
>
>
>
> We have a seven node cluster. One the nodes started to have issues but it
> was running. We decided to add a new node, and remove the problematic node
> after the new node joins. However, the new node did not join the cluster
> even after three days. Hence, we decided to go with the replacement option.
> We shutdown the problematic node. After that, we stopped cassandra on the
> bootstraping node, deleted all the data, and restarted that node as the
> replacement node for the problematic node.
>
>
>
> Since, we reused the bootstrapping node as the replacement node, I am
> wondering whether that is causing any issue. Any insights are appreciated.
>
>
>
> This is the output of nodetool describecluster from the replacement node,
> and two other nodes.
>
>
>
> mhossain@cassandra-24:~$ nodetool describecluster
>
> Cluster Information:
>
>             Name: App
>
>             Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>
>             Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
>             Schema versions:
>
>                         80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
> 10.0.7.4, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]
>
>
>
>
>
> mhossain@cassandra-13:~$ nodetool describecluster
>
> Cluster Information:
>
>             Name: App
>
>             Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>
>             Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
>             Schema versions:
>
>                         80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
> 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]
>
>
>
>                         UNREACHABLE: [10.0.7.91, 10.0.7.4]
>
>
>
>
>
> mhossain@cassandra-09:~$ nodetool describecluster
>
> Cluster Information:
>
>             Name: App
>
>             Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>
>             Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
>             Schema versions:
>
>                         80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
> 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]
>
>
>
>                         UNREACHABLE: [10.0.7.91, 10.0.7.4]
>
>
>
>
>
> cassandra-24 (10.0.7.4) is the replacement node. 10.0.7.91 is the ip
> address of the dead node.
>
>
>
> -Mir
>
>
>
> On Thu, Apr 21, 2016 at 10:02 AM, Mir Tanvir Hossain <
> mir.tanvir.hoss...@gmail.com> wrote:
>
> Hi, I am trying to replace a dead node with by following
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.datastax.com%2fen%2fcassandra%2f2.0%2fcassandra%2foperations%2fops_replace_node_t.html&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7c40641d35c89d47225a3208d36a15ecff%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Tpe1lrALsVKKwgZG1THMvDyZJlN6ps596CtkKyOguUk%3d>.
> It's been 3 full days since the replacement node started, and the node is
> still not showing up as part of the cluster on OpsCenter. I was wondering
> whether the delay is due to the fact that I have a test keyspace with
> replication factor of one? If I delete that keyspace, would the new node
> successfully replace the dead node? Any general insight will be hugely
> appreciated.
>
>
>
> Thanks,
>
> Mir
>
>
>
>
>
>
>

Re: Problem Replacing a Dead Node

Reply via email to