RE: Assassinate fails

Kenneth Brotman Thu, 04 Apr 2019 07:51:17 -0700

Hi Alex,


You might have inconsistent data in your system tables.  Try setting the 
consistency level to ALL, then do read query of system tables to force repair.

 

Kenneth Brotman

 

From: Alex [mailto:m...@aca-o.com] 
Sent: Thursday, April 04, 2019 1:58 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

 

Hi Anthony,

Thanks for your help.

I tried to run multiple times in quick succession but it fails with :

-- StackTrace --
java.lang.RuntimeException: Endpoint still alive: /192.168.1.18 generation 
changed while trying to assassinate it
        at 
org.apache.cassandra.gms.Gossiper.assassinateEndpoint(Gossiper.java:592)

I can see that the generation number for this node increases by 1 every time I 
call nodetool assassinate ; and the command itself waits for 30 seconds before 
assassinating node. When ran multiple times in quick succession, the command 
fails because the generation number has been changed by the previous instance.

 

In 'nodetool gossipinfo', the node is marked as "LEFT" on every node.

However, in 'nodetool describecluster', this node is marked as "unreacheable" 
on 3 nodes out of 5.

 

Alex

 

Le 04.04.2019 00:56, Anthony Grasso a écrit :

Hi Alex, 

 

We wrote a blog post on this topic late last year: 
http://thelastpickle.com/blog/2018/09/18/assassinate.html.

 

In short, you will need to run the assassinate command on each node 
simultaneously a number of times in quick succession. This will generate a 
number of messages requesting all nodes completely forget there used to be an 
entry within the gossip state for the given IP address.

 

Regards,

Anthony

 

On Thu, 4 Apr 2019 at 03:32, Alex <m...@aca-o.com> wrote:

Same result it seems:
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.net:type=Gossiper
#bean is set to org.apache.cassandra.net:type=Gossiper
$>run unsafeAssassinateEndpoint 192.168.1.18
#calling operation unsafeAssassinateEndpoint of mbean 
org.apache.cassandra.net:type=Gossiper
#RuntimeMBeanException: java.lang.NullPointerException


There not much more to see in log files :
WARN  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626 
Gossiper.java:575 - Assassinating /192.168.1.18 via gossip
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627 
Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 does 
not change
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628 
Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631 
StorageService.java:2324 - Removing tokens [..] for /192.168.1.18




Le 03.04.2019 17:10, Nick Hatfield a écrit :
> Run assassinate the old way. I works very well...
> 
> wget -q -O jmxterm.jar
> http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
> 
> java -jar ./jmxterm.jar
> 
> $>open localhost:7199
> 
> $>bean org.apache.cassandra.net:type=Gossiper
> 
> $>run unsafeAssassinateEndpoint 192.168.1.18
> 
> $>quit
> 
> 
> Happy deleting
> 
> -----Original Message-----
> From: Alex [mailto:m...@aca-o.com]
> Sent: Wednesday, April 03, 2019 10:42 AM
> To: user@cassandra.apache.org
> Subject: Assassinate fails
> 
> Hello,
> 
> Short story:
> - I had to replace a dead node in my cluster
> - 1 week after, dead node is still seen as DN by 3 out of 5 nodes
> - dead node has null host_id
> - assassinate on dead node fails with error
> 
> How can I get rid of this dead node ?
> 
> 
> Long story:
> I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built
> a new node from scratch and "replaced" the dead node using the
> information from this page
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html.
> It looked like the replacement went ok.
> 
> I added two more nodes to strengthen the cluster.
> 
> A few days have passed and the dead node is still visible and marked
> as "down" on 3 of 5 nodes in nodetool status:
> 
> --  Address       Load       Tokens       Owns (effective)  Host ID
>                           Rack
> UN  192.168.1.9   16 GiB     256          35.0%
> 76223d4c-9d9f-417f-be27-cebb791cddcc  rack1
> UN  192.168.1.12  16.09 GiB  256          34.0%
> 719601e2-54a6-440e-a379-c9cf2dc20564  rack1
> UN  192.168.1.14  14.16 GiB  256          32.6%
> d8017a03-7e4e-47b7-89b9-cd9ec472d74f  rack1
> UN  192.168.1.17  15.4 GiB   256          34.1%
> fa238b21-1db1-47dc-bfb7-beedc6c9967a  rack1
> DN  192.168.1.18  24.3 GiB   256          33.7%             null
>                           rack1
> UN  192.168.1.22  19.06 GiB  256          30.7%
> 09d24557-4e98-44c3-8c9d-53c4c31066e1  rack1
> 
> Its host ID is null, so I cannot use nodetool removenode. Moreover
> nodetool assassinate 192.168.1.18 fails with :
> 
> error: null
> -- StackTrace --
> java.lang.NullPointerException
> 
> And in system.log:
> 
> INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:39:38,595
> Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 does
> not change INFO  [CompactionExecutor:547] 2019-03-27 17:39:38,669
> AutoSavingCache.java:393 - Saved KeyCache (27316 items) in 163 ms INFO
>  [IndexSummaryManager:1] 2019-03-27 17:40:03,620
> IndexSummaryRedistribution.java:75 - Redistributing index summaries
> INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,597
> Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN INFO  [RMI
> TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,599
> StorageService.java:2324 - Removing tokens [-1061369577393671924,...]
> ERROR [GossipStage:1] 2019-03-27 17:40:08,600 CassandraDaemon.java:226
> - Exception in thread Thread[GossipStage:1,5,main]
> java.lang.NullPointerException: null
> 
> 
> In system.peers, the dead node shows and has the same ID as the 
> replacing node :
> 
> cqlsh> select peer, host_id from system.peers;
> 
>   peer         | host_id
> --------------+--------------------------------------
>   192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>   192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>    192.168.1.9 | 76223d4c-9d9f-417f-be27-cebb791cddcc
>   192.168.1.14 | d8017a03-7e4e-47b7-89b9-cd9ec472d74f
>   192.168.1.12 | 719601e2-54a6-440e-a379-c9cf2dc20564
> 
> Dead node and replacing node have different tokens in system.peers.
> 
> I should add that I also tried decommission on a node that still
> 192.168.1.18 in its peers. - it is still marked as "leaving" 5 days
> later. Nothing in notetool netstats or nodetool compactionstats.
> 
> 
> Thank you for taking the time to read this. Hope you can help.
> 
> Alex
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Assassinate fails

Reply via email to