Re: Problem Replacing a Dead Node

Mir Tanvir Hossain Thu, 21 Apr 2016 11:03:07 -0700

Hi Anubhav, thanks for getting back to me. here is the information that you
requested.

datastax agent is running on the node. However, in the agent log I see

ERROR [clojure-agent-send-off-pool-4] 2016-04-21 17:51:46,055 Can't connect
to Cassandra (All host(s) tried for query failed (tried: /10.0.7.4:9042
(com.datastax.driver.core.TransportException: [/10.0.7.4:9042] Cannot
connect))), retrying soon.
ERROR [clojure-agent-send-off-pool-5] 2016-04-21 17:51:46,056 Can't connect
to Cassandra (All host(s) tried for query failed (tried: /10.0.7.4:9042
(com.datastax.driver.core.TransportException: [/10.0.7.4:9042] Cannot
connect))), retrying soon.

I am guessing this is because the node is not accepting any reads yet.

here is the output of nodetool status from the replacement node

mhossain@cassandra-24:~$ nodetool status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns   Host ID
    Rack
UN  10.0.7.80   117.32 GB  256     14.2%
 508b6503-e342-41bf-9baf-1f16ed1ebfc8  1a
UN  10.0.7.4    68.46 GB   256     15.1%
 b9a99507-ef83-441a-a007-da91144fae8f  1a
UN  10.0.7.190  106.54 GB  256     15.0%
 32cb119e-e13d-45db-89d1-a4385c47cee2  1a
UN  10.0.7.100  80.99 GB   256     13.5%
 efe3f327-48e8-4105-b096-a7f5c85736f9  1a
UN  10.0.7.195  105.9 GB   256     14.1%
 96403b7e-57fd-4b84-9607-745ec2d826df  1a
UN  10.0.7.160  98.42 GB   256     13.9%
 3a788a95-63f9-44f2-af91-9f49de75db63  1a
UN  10.0.7.176  93.04 GB   256     14.3%
 d9124ced-847d-474e-a230-ea67ba46dfa8  1a

here is the output of nodetool status from another existing node on the
cluster:

mhossain@cassandra-13:~$ nodetool status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns   Host ID
    Rack
UN  10.0.7.80   117.32 GB  256     14.2%
 508b6503-e342-41bf-9baf-1f16ed1ebfc8  1a
UN  10.0.7.190  106.54 GB  256     15.0%
 32cb119e-e13d-45db-89d1-a4385c47cee2  1a
UN  10.0.7.100  80.99 GB   256     13.5%
 efe3f327-48e8-4105-b096-a7f5c85736f9  1a
UN  10.0.7.195  105.9 GB   256     14.1%
 96403b7e-57fd-4b84-9607-745ec2d826df  1a
DN  10.0.7.91   115.97 GB  256     15.1%
 b9a99507-ef83-441a-a007-da91144fae8f  1a
UN  10.0.7.160  98.42 GB   256     13.9%
 3a788a95-63f9-44f2-af91-9f49de75db63  1a
UN  10.0.7.176  93.04 GB   256     14.3%
 d9124ced-847d-474e-a230-ea67ba46dfa8  1a

10.0.7.91 is the node I am trying to replace.

Here is the output of tail -n 50 /var/log/cassandra/system.log

INFO [GossipStage:1] 2016-04-21 17:58:01,812 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8885324152940404221.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,812 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8894597858951527418.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8895886558199074220.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8943679396315445898.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8971093763454238578.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
9147567932890414079.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
9201669617284985565.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
930880968512921941.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line
1671) Relocating ranges:
DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814
PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or
moving nodes, and no relocating tokens -> empty pending ranges for
test_shadow
DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814
PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or
moving nodes, and no relocating tokens -> empty pending ranges for app
DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814
PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or
moving nodes, and no relocating tokens -> empty pending ranges for test
DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814
PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or
moving nodes, and no relocating tokens -> empty pending ranges for OpsCenter
DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,815
PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or
moving nodes, and no relocating tokens -> empty pending ranges for
system_traces
DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,815
PendingRangeCalculatorService.java (line 68) finished calculation for 6
keyspaces in 1ms
DEBUG [GossipStage:1] 2016-04-21 17:58:01,816 MigrationManager.java (line
95) Not pulling schema because versions match or shouldPullSchemaFrom
returned false
DEBUG [GossipStage:1] 2016-04-21 17:58:02,703 FailureDetector.java (line
338) Ignoring interval time of 2000848005
DEBUG [GossipStage:1] 2016-04-21 17:58:02,703 FailureDetector.java (line
338) Ignoring interval time of 2403611432
DEBUG [Background_Reporter:1] 2016-04-21 17:58:02,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:03,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:04,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:05,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:06,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [GossipStage:1] 2016-04-21 17:58:07,022 FailureDetector.java (line
338) Ignoring interval time of 2000269169
DEBUG [Background_Reporter:1] 2016-04-21 17:58:07,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:08,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [GossipStage:1] 2016-04-21 17:58:09,315 FailureDetector.java (line
338) Ignoring interval time of 3981962280
DEBUG [GossipStage:1] 2016-04-21 17:58:09,316 FailureDetector.java (line
338) Ignoring interval time of 2293538710
DEBUG [GossipStage:1] 2016-04-21 17:58:09,705 FailureDetector.java (line
338) Ignoring interval time of 2000566794
DEBUG [Background_Reporter:1] 2016-04-21 17:58:09,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:10,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [GossipStage:1] 2016-04-21 17:58:11,706 FailureDetector.java (line
338) Ignoring interval time of 2390060889
DEBUG [Background_Reporter:1] 2016-04-21 17:58:11,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:12,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:13,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:14,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [Background_Reporter:1] 2016-04-21 17:58:15,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [GossipStage:1] 2016-04-21 17:58:16,707 FailureDetector.java (line
338) Ignoring interval time of 2557244523
DEBUG [GossipStage:1] 2016-04-21 17:58:16,707 FailureDetector.java (line
338) Ignoring interval time of 2000746960
DEBUG [GossipStage:1] 2016-04-21 17:58:16,707 FailureDetector.java (line
338) Ignoring interval time of 2557596167
DEBUG [Background_Reporter:1] 2016-04-21 17:58:16,727 StorageService.java
(line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4
DEBUG [ScheduledTasks:1] 2016-04-21 17:58:17,187 GCInspector.java (line
118) GC for ParNew: 140 ms for 1 collections, 2557797536 used; max is
4273995776

Please do let me know if you need anything else.

Thanks,
Mir

On Thu, Apr 21, 2016 at 10:22 AM, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Is the datastax-agent running fine on the node ? What does nodetool status
> and system.log show ?
>
>
>
> *From:* Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
> *Sent:* Thursday, April 21, 2016 10:02 AM
> *To:* user@cassandra.apache.org
> *Subject:* Problem Replacing a Dead Node
>
>
>
> Hi, I am trying to replace a dead node with by following
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
> <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.datastax.com%2fen%2fcassandra%2f2.0%2fcassandra%2foperations%2fops_replace_node_t.html&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7ce0bd2738fe1a4640111208d36a06be83%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=kmsAE3Bd1u1mHd%2ftsJIXXAhdHMpDyxl2saXC%2fAqIi44%3d>.
> It's been 3 full days since the replacement node started, and the node is
> still not showing up as part of the cluster on OpsCenter. I was wondering
> whether the delay is due to the fact that I have a test keyspace with
> replication factor of one? If I delete that keyspace, would the new node
> successfully replace the dead node? Any general insight will be hugely
> appreciated.
>
>
>
> Thanks,
>
> Mir
>
>
>
>
>

Re: Problem Replacing a Dead Node

Reply via email to