Hi Alex,
You might have inconsistent data in your system tables. Try setting the consistency level to ALL, then do read query of system tables to force repair. Kenneth Brotman From: Alex [mailto:m...@aca-o.com] Sent: Thursday, April 04, 2019 1:58 AM To: user@cassandra.apache.org Subject: Re: Assassinate fails Hi Anthony, Thanks for your help. I tried to run multiple times in quick succession but it fails with : -- StackTrace -- java.lang.RuntimeException: Endpoint still alive: /192.168.1.18 generation changed while trying to assassinate it at org.apache.cassandra.gms.Gossiper.assassinateEndpoint(Gossiper.java:592) I can see that the generation number for this node increases by 1 every time I call nodetool assassinate ; and the command itself waits for 30 seconds before assassinating node. When ran multiple times in quick succession, the command fails because the generation number has been changed by the previous instance. In 'nodetool gossipinfo', the node is marked as "LEFT" on every node. However, in 'nodetool describecluster', this node is marked as "unreacheable" on 3 nodes out of 5. Alex Le 04.04.2019 00:56, Anthony Grasso a écrit : Hi Alex, We wrote a blog post on this topic late last year: http://thelastpickle.com/blog/2018/09/18/assassinate.html. In short, you will need to run the assassinate command on each node simultaneously a number of times in quick succession. This will generate a number of messages requesting all nodes completely forget there used to be an entry within the gossip state for the given IP address. Regards, Anthony On Thu, 4 Apr 2019 at 03:32, Alex <m...@aca-o.com> wrote: Same result it seems: Welcome to JMX terminal. Type "help" for available commands. $>open localhost:7199 #Connection to localhost:7199 is opened $>bean org.apache.cassandra.net:type=Gossiper #bean is set to org.apache.cassandra.net:type=Gossiper $>run unsafeAssassinateEndpoint 192.168.1.18 #calling operation unsafeAssassinateEndpoint of mbean org.apache.cassandra.net:type=Gossiper #RuntimeMBeanException: java.lang.NullPointerException There not much more to see in log files : WARN [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626 Gossiper.java:575 - Assassinating /192.168.1.18 via gossip INFO [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627 Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 does not change INFO [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628 Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN INFO [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631 StorageService.java:2324 - Removing tokens [..] for /192.168.1.18 Le 03.04.2019 17:10, Nick Hatfield a écrit : > Run assassinate the old way. I works very well... > > wget -q -O jmxterm.jar > http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar > > java -jar ./jmxterm.jar > > $>open localhost:7199 > > $>bean org.apache.cassandra.net:type=Gossiper > > $>run unsafeAssassinateEndpoint 192.168.1.18 > > $>quit > > > Happy deleting > > -----Original Message----- > From: Alex [mailto:m...@aca-o.com] > Sent: Wednesday, April 03, 2019 10:42 AM > To: user@cassandra.apache.org > Subject: Assassinate fails > > Hello, > > Short story: > - I had to replace a dead node in my cluster > - 1 week after, dead node is still seen as DN by 3 out of 5 nodes > - dead node has null host_id > - assassinate on dead node fails with error > > How can I get rid of this dead node ? > > > Long story: > I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built > a new node from scratch and "replaced" the dead node using the > information from this page > https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html. > It looked like the replacement went ok. > > I added two more nodes to strengthen the cluster. > > A few days have passed and the dead node is still visible and marked > as "down" on 3 of 5 nodes in nodetool status: > > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 192.168.1.9 16 GiB 256 35.0% > 76223d4c-9d9f-417f-be27-cebb791cddcc rack1 > UN 192.168.1.12 16.09 GiB 256 34.0% > 719601e2-54a6-440e-a379-c9cf2dc20564 rack1 > UN 192.168.1.14 14.16 GiB 256 32.6% > d8017a03-7e4e-47b7-89b9-cd9ec472d74f rack1 > UN 192.168.1.17 15.4 GiB 256 34.1% > fa238b21-1db1-47dc-bfb7-beedc6c9967a rack1 > DN 192.168.1.18 24.3 GiB 256 33.7% null > rack1 > UN 192.168.1.22 19.06 GiB 256 30.7% > 09d24557-4e98-44c3-8c9d-53c4c31066e1 rack1 > > Its host ID is null, so I cannot use nodetool removenode. Moreover > nodetool assassinate 192.168.1.18 fails with : > > error: null > -- StackTrace -- > java.lang.NullPointerException > > And in system.log: > > INFO [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:39:38,595 > Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 does > not change INFO [CompactionExecutor:547] 2019-03-27 17:39:38,669 > AutoSavingCache.java:393 - Saved KeyCache (27316 items) in 163 ms INFO > [IndexSummaryManager:1] 2019-03-27 17:40:03,620 > IndexSummaryRedistribution.java:75 - Redistributing index summaries > INFO [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,597 > Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN INFO [RMI > TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,599 > StorageService.java:2324 - Removing tokens [-1061369577393671924,...] > ERROR [GossipStage:1] 2019-03-27 17:40:08,600 CassandraDaemon.java:226 > - Exception in thread Thread[GossipStage:1,5,main] > java.lang.NullPointerException: null > > > In system.peers, the dead node shows and has the same ID as the > replacing node : > > cqlsh> select peer, host_id from system.peers; > > peer | host_id > --------------+-------------------------------------- > 192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 > 192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 > 192.168.1.9 | 76223d4c-9d9f-417f-be27-cebb791cddcc > 192.168.1.14 | d8017a03-7e4e-47b7-89b9-cd9ec472d74f > 192.168.1.12 | 719601e2-54a6-440e-a379-c9cf2dc20564 > > Dead node and replacing node have different tokens in system.peers. > > I should add that I also tried decommission on a node that still > 192.168.1.18 in its peers. - it is still marked as "leaving" 5 days > later. Nothing in notetool netstats or nodetool compactionstats. > > > Thank you for taking the time to read this. Hope you can help. > > Alex > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org