Hi Anubhav, thanks for getting back to me. here is the information that you requested.
datastax agent is running on the node. However, in the agent log I see ERROR [clojure-agent-send-off-pool-4] 2016-04-21 17:51:46,055 Can't connect to Cassandra (All host(s) tried for query failed (tried: /10.0.7.4:9042 (com.datastax.driver.core.TransportException: [/10.0.7.4:9042] Cannot connect))), retrying soon. ERROR [clojure-agent-send-off-pool-5] 2016-04-21 17:51:46,056 Can't connect to Cassandra (All host(s) tried for query failed (tried: /10.0.7.4:9042 (com.datastax.driver.core.TransportException: [/10.0.7.4:9042] Cannot connect))), retrying soon. I am guessing this is because the node is not accepting any reads yet. here is the output of nodetool status from the replacement node mhossain@cassandra-24:~$ nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.7.80 117.32 GB 256 14.2% 508b6503-e342-41bf-9baf-1f16ed1ebfc8 1a UN 10.0.7.4 68.46 GB 256 15.1% b9a99507-ef83-441a-a007-da91144fae8f 1a UN 10.0.7.190 106.54 GB 256 15.0% 32cb119e-e13d-45db-89d1-a4385c47cee2 1a UN 10.0.7.100 80.99 GB 256 13.5% efe3f327-48e8-4105-b096-a7f5c85736f9 1a UN 10.0.7.195 105.9 GB 256 14.1% 96403b7e-57fd-4b84-9607-745ec2d826df 1a UN 10.0.7.160 98.42 GB 256 13.9% 3a788a95-63f9-44f2-af91-9f49de75db63 1a UN 10.0.7.176 93.04 GB 256 14.3% d9124ced-847d-474e-a230-ea67ba46dfa8 1a here is the output of nodetool status from another existing node on the cluster: mhossain@cassandra-13:~$ nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.7.80 117.32 GB 256 14.2% 508b6503-e342-41bf-9baf-1f16ed1ebfc8 1a UN 10.0.7.190 106.54 GB 256 15.0% 32cb119e-e13d-45db-89d1-a4385c47cee2 1a UN 10.0.7.100 80.99 GB 256 13.5% efe3f327-48e8-4105-b096-a7f5c85736f9 1a UN 10.0.7.195 105.9 GB 256 14.1% 96403b7e-57fd-4b84-9607-745ec2d826df 1a DN 10.0.7.91 115.97 GB 256 15.1% b9a99507-ef83-441a-a007-da91144fae8f 1a UN 10.0.7.160 98.42 GB 256 13.9% 3a788a95-63f9-44f2-af91-9f49de75db63 1a UN 10.0.7.176 93.04 GB 256 14.3% d9124ced-847d-474e-a230-ea67ba46dfa8 1a 10.0.7.91 is the node I am trying to replace. Here is the output of tail -n 50 /var/log/cassandra/system.log INFO [GossipStage:1] 2016-04-21 17:58:01,812 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 8885324152940404221. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,812 StorageService.java (line 1671) Relocating ranges: INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 8894597858951527418. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1671) Relocating ranges: INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 8895886558199074220. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1671) Relocating ranges: INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 8943679396315445898. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1671) Relocating ranges: INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 8971093763454238578. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1671) Relocating ranges: INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 9147567932890414079. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1671) Relocating ranges: INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 9201669617284985565. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line 1671) Relocating ranges: INFO [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line 1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token 930880968512921941. Ignoring /10.0.7.91 DEBUG [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line 1671) Relocating ranges: DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814 PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or moving nodes, and no relocating tokens -> empty pending ranges for test_shadow DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814 PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or moving nodes, and no relocating tokens -> empty pending ranges for app DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814 PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or moving nodes, and no relocating tokens -> empty pending ranges for test DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814 PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or moving nodes, and no relocating tokens -> empty pending ranges for OpsCenter DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,815 PendingRangeCalculatorService.java (line 128) No bootstrapping, leaving or moving nodes, and no relocating tokens -> empty pending ranges for system_traces DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,815 PendingRangeCalculatorService.java (line 68) finished calculation for 6 keyspaces in 1ms DEBUG [GossipStage:1] 2016-04-21 17:58:01,816 MigrationManager.java (line 95) Not pulling schema because versions match or shouldPullSchemaFrom returned false DEBUG [GossipStage:1] 2016-04-21 17:58:02,703 FailureDetector.java (line 338) Ignoring interval time of 2000848005 DEBUG [GossipStage:1] 2016-04-21 17:58:02,703 FailureDetector.java (line 338) Ignoring interval time of 2403611432 DEBUG [Background_Reporter:1] 2016-04-21 17:58:02,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:03,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:04,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:05,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:06,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [GossipStage:1] 2016-04-21 17:58:07,022 FailureDetector.java (line 338) Ignoring interval time of 2000269169 DEBUG [Background_Reporter:1] 2016-04-21 17:58:07,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:08,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [GossipStage:1] 2016-04-21 17:58:09,315 FailureDetector.java (line 338) Ignoring interval time of 3981962280 DEBUG [GossipStage:1] 2016-04-21 17:58:09,316 FailureDetector.java (line 338) Ignoring interval time of 2293538710 DEBUG [GossipStage:1] 2016-04-21 17:58:09,705 FailureDetector.java (line 338) Ignoring interval time of 2000566794 DEBUG [Background_Reporter:1] 2016-04-21 17:58:09,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:10,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [GossipStage:1] 2016-04-21 17:58:11,706 FailureDetector.java (line 338) Ignoring interval time of 2390060889 DEBUG [Background_Reporter:1] 2016-04-21 17:58:11,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:12,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:13,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:14,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [Background_Reporter:1] 2016-04-21 17:58:15,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [GossipStage:1] 2016-04-21 17:58:16,707 FailureDetector.java (line 338) Ignoring interval time of 2557244523 DEBUG [GossipStage:1] 2016-04-21 17:58:16,707 FailureDetector.java (line 338) Ignoring interval time of 2000746960 DEBUG [GossipStage:1] 2016-04-21 17:58:16,707 FailureDetector.java (line 338) Ignoring interval time of 2557596167 DEBUG [Background_Reporter:1] 2016-04-21 17:58:16,727 StorageService.java (line 1401) Ignoring state change for dead or unknown endpoint: /10.0.7.4 DEBUG [ScheduledTasks:1] 2016-04-21 17:58:17,187 GCInspector.java (line 118) GC for ParNew: 140 ms for 1 collections, 2557797536 used; max is 4273995776 Please do let me know if you need anything else. Thanks, Mir On Thu, Apr 21, 2016 at 10:22 AM, Anubhav Kale <anubhav.k...@microsoft.com> wrote: > Is the datastax-agent running fine on the node ? What does nodetool status > and system.log show ? > > > > *From:* Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com] > *Sent:* Thursday, April 21, 2016 10:02 AM > *To:* user@cassandra.apache.org > *Subject:* Problem Replacing a Dead Node > > > > Hi, I am trying to replace a dead node with by following > https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html > <https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.datastax.com%2fen%2fcassandra%2f2.0%2fcassandra%2foperations%2fops_replace_node_t.html&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7ce0bd2738fe1a4640111208d36a06be83%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=kmsAE3Bd1u1mHd%2ftsJIXXAhdHMpDyxl2saXC%2fAqIi44%3d>. > It's been 3 full days since the replacement node started, and the node is > still not showing up as part of the cluster on OpsCenter. I was wondering > whether the delay is due to the fact that I have a test keyspace with > replication factor of one? If I delete that keyspace, would the new node > successfully replace the dead node? Any general insight will be hugely > appreciated. > > > > Thanks, > > Mir > > > > >