Hi Anthony, 

Thanks for your help. 

I tried to run multiple times in quick succession but it fails with : 

-- StackTrace --
java.lang.RuntimeException: Endpoint still alive: /192.168.1.18
generation changed while trying to assassinate it
        at
org.apache.cassandra.gms.Gossiper.assassinateEndpoint(Gossiper.java:592)


I can see that the generation number for this node increases by 1 every
time I call nodetool assassinate ; and the command itself waits for 30
seconds before assassinating node. When ran multiple times in quick
succession, the command fails because the generation number has been
changed by the previous instance. 

In 'nodetool gossipinfo', the node is marked as "LEFT" on every node. 

However, in 'nodetool describecluster', this node is marked as
"unreacheable" on 3 nodes out of 5. 

Alex 

Le 04.04.2019 00:56, Anthony Grasso a écrit :

> Hi Alex, 
> 
> We wrote a blog post on this topic late last year: 
> http://thelastpickle.com/blog/2018/09/18/assassinate.html. 
> 
> In short, you will need to run the assassinate command on each node 
> simultaneously a number of times in quick succession. This will generate a 
> number of messages requesting all nodes completely forget there used to be an 
> entry within the gossip state for the given IP address. 
> 
> Regards, 
> Anthony 
> 
> On Thu, 4 Apr 2019 at 03:32, Alex <m...@aca-o.com> wrote: 
> 
>> Same result it seems:
>> Welcome to JMX terminal. Type "help" for available commands.
>> $>open localhost:7199
>> #Connection to localhost:7199 is opened
>> $>bean org.apache.cassandra.net:type=Gossiper
>> #bean is set to org.apache.cassandra.net:type=Gossiper
>> $>run unsafeAssassinateEndpoint 192.168.1.18
>> #calling operation unsafeAssassinateEndpoint of mbean 
>> org.apache.cassandra.net:type=Gossiper
>> #RuntimeMBeanException: java.lang.NullPointerException
>> 
>> There not much more to see in log files :
>> WARN  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626 
>> Gossiper.java:575 - Assassinating /192.168.1.18 [1] via gossip
>> INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627 
>> Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 [1] does 
>> not change
>> INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628 
>> Gossiper.java:1029 - InetAddress /192.168.1.18 [1] is now DOWN
>> INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631 
>> StorageService.java:2324 - Removing tokens [..] for /192.168.1.18 [1]
>> 
>> Le 03.04.2019 17:10, Nick Hatfield a écrit :
>>> Run assassinate the old way. I works very well...
>>> 
>>> wget -q -O jmxterm.jar
>>> http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
>>> 
>>> java -jar ./jmxterm.jar
>>> 
>>> $>open localhost:7199
>>> 
>>> $>bean org.apache.cassandra.net:type=Gossiper
>>> 
>>> $>run unsafeAssassinateEndpoint 192.168.1.18
>>> 
>>> $>quit
>>> 
>>> 
>>> Happy deleting
>>> 
>>> -----Original Message-----
>>> From: Alex [mailto:m...@aca-o.com]
>>> Sent: Wednesday, April 03, 2019 10:42 AM
>>> To: user@cassandra.apache.org
>>> Subject: Assassinate fails
>>> 
>>> Hello,
>>> 
>>> Short story:
>>> - I had to replace a dead node in my cluster
>>> - 1 week after, dead node is still seen as DN by 3 out of 5 nodes
>>> - dead node has null host_id
>>> - assassinate on dead node fails with error
>>> 
>>> How can I get rid of this dead node ?
>>> 
>>> 
>>> Long story:
>>> I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built
>>> a new node from scratch and "replaced" the dead node using the
>>> information from this page
>>> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html.
>>> It looked like the replacement went ok.
>>> 
>>> I added two more nodes to strengthen the cluster.
>>> 
>>> A few days have passed and the dead node is still visible and marked
>>> as "down" on 3 of 5 nodes in nodetool status:
>>> 
>>> --  Address       Load       Tokens       Owns (effective)  Host ID
>>> Rack
>>> UN  192.168.1.9   16 GiB     256          35.0%
>>> 76223d4c-9d9f-417f-be27-cebb791cddcc  rack1
>>> UN  192.168.1.12  16.09 GiB  256          34.0%
>>> 719601e2-54a6-440e-a379-c9cf2dc20564  rack1
>>> UN  192.168.1.14  14.16 GiB  256          32.6%
>>> d8017a03-7e4e-47b7-89b9-cd9ec472d74f  rack1
>>> UN  192.168.1.17  15.4 GiB   256          34.1%
>>> fa238b21-1db1-47dc-bfb7-beedc6c9967a  rack1
>>> DN  192.168.1.18  24.3 GiB   256          33.7%             null
>>> rack1
>>> UN  192.168.1.22  19.06 GiB  256          30.7%
>>> 09d24557-4e98-44c3-8c9d-53c4c31066e1  rack1
>>> 
>>> Its host ID is null, so I cannot use nodetool removenode. Moreover
>>> nodetool assassinate 192.168.1.18 fails with :
>>> 
>>> error: null
>>> -- StackTrace --
>>> java.lang.NullPointerException
>>> 
>>> And in system.log:
>>> 
>>> INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:39:38,595
>>> Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 [1] does
>>> not change INFO  [CompactionExecutor:547] 2019-03-27 17:39:38,669
>>> AutoSavingCache.java:393 - Saved KeyCache (27316 items) in 163 ms INFO
>>> [IndexSummaryManager:1] 2019-03-27 17:40:03,620
>>> IndexSummaryRedistribution.java:75 - Redistributing index summaries
>>> INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,597
>>> Gossiper.java:1029 - InetAddress /192.168.1.18 [1] is now DOWN INFO  [RMI
>>> TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,599
>>> StorageService.java:2324 - Removing tokens [-1061369577393671924,...]
>>> ERROR [GossipStage:1] 2019-03-27 17:40:08,600 CassandraDaemon.java:226
>>> - Exception in thread Thread[GossipStage:1,5,main]
>>> java.lang.NullPointerException: null
>>> 
>>> 
>>> In system.peers, the dead node shows and has the same ID as the 
>>> replacing node :
>>> 
>>> cqlsh> select peer, host_id from system.peers;
>>> 
>>> peer         | host_id
>>> --------------+--------------------------------------
>>> 192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>>> 192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>>> 192.168.1.9 | 76223d4c-9d9f-417f-be27-cebb791cddcc
>>> 192.168.1.14 | d8017a03-7e4e-47b7-89b9-cd9ec472d74f
>>> 192.168.1.12 | 719601e2-54a6-440e-a379-c9cf2dc20564
>>> 
>>> Dead node and replacing node have different tokens in system.peers.
>>> 
>>> I should add that I also tried decommission on a node that still
>>> 192.168.1.18 in its peers. - it is still marked as "leaving" 5 days
>>> later. Nothing in notetool netstats or nodetool compactionstats.
>>> 
>>> 
>>> Thank you for taking the time to read this. Hope you can help.
>>> 
>>> Alex
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org

 

Links:
------
[1] http://192.168.1.18

Reply via email to