Is this using GPFS?  If so, can you open a JIRA? It feels like potentially
GPFS is not persisting the rack/DC info into system.peers and loses the DC
on restart. This is somewhat understandable, but definitely deserves a
JIRA.

On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi Fd,
>
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
> reported node2 to be DN, then I killed node1 and node3 and I restarted them
> and node2 was reported like this:
>
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                      Rack
> DN  172.19.0.8  ?          256          64.0%
>  bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                      Rack
> UN  172.19.0.5  382.75 KiB  256          64.4%
>  2a062140-2428-4092-b48b-7495d083d7f9  rack1
> UN  172.19.0.9  171.41 KiB  256          71.6%
>  9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3
>
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it
> was part of the "Datacenter: dc1" output where both node1 and node3 were.
>
> But after killing both node1 and node3 (so cluster was totally down),
> after restarting them, node2 was reported like that.
>
> I do not know what is the difference here. Are gossiping data somewhere
> stored on the disk? I would say so, otherwise there is no way how could
> node1 / node3 report
> that node2 is down but at the same time I dont get why it is "out of the
> list" where node1 and node3 are.
>
>
> On Fri, 15 Mar 2019 at 02:42, Fd Habash <fmhab...@gmail.com> wrote:
>
>> I can conclusively say, none of these commands were run. However, I think
>> this is  the likely scenario …
>>
>>
>>
>> If you have a cluster of three nodes 1,2,3 …
>>
>>    - If 3 shows as DN
>>    - Restart C* on 1 & 2
>>    - Nodetool status should NOT show node 3 IP at all.
>>
>>
>>
>> Restarting the cluster while a node is down resets gossip state.
>>
>>
>>
>> There is a good chance this is what happened.
>>
>>
>>
>> Plausible?
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>> *From: *Jeff Jirsa <jji...@gmail.com>
>> *Sent: *Thursday, March 14, 2019 11:06 AM
>> *To: *cassandra <user@cassandra.apache.org>
>> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
>> exist ingossip
>>
>>
>>
>> Two things that wouldn't be a bug:
>>
>>
>>
>> You could have run removenode
>>
>> You could have run assassinate
>>
>>
>>
>> Also could be some new bug, but that's much less likely.
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fmhab...@gmail.com> wrote:
>>
>> I have a node which I know for certain was a cluster member last week. It
>> showed in nodetool status as DN. When I attempted to replace it today, I
>> got this message
>>
>>
>>
>> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
>> encountered during startup
>>
>> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
>> because it doesn't exist in gossip
>>
>>         at
>> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>> ~[apache-cassandra-2.2.8.jar:2.2.8]
>>
>>
>>
>>
>>
>> DN  10.xx.xx.xx  388.43 KB  256          6.9%
>> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>>
>>
>>
>> Under what conditions does this happen?
>>
>>
>>
>>
>>
>> ----------------
>> Thank you
>>
>>
>>
>>
>>
>
> Stefan Miklosovic
>
>

Reply via email to