Semen Boikov created IGNITE-6700:
------------------------------------

             Summary: Node considered as failed can cause failure of others 
nodes
                 Key: IGNITE-6700
                 URL: https://issues.apache.org/jira/browse/IGNITE-6700
             Project: Ignite
          Issue Type: Bug
      Security Level: Public (Viewable by anyone)
          Components: general
            Reporter: Semen Boikov
            Assignee: Semen Boikov
            Priority: Critical


Node considered as failed can cause failure of others nodes in cluster. 

There is an issue in TcpDiscoveryAbstractMessage.failedNodes processing, if 
message is received from node considered as failed, then failedNodes should be 
ignored.

Possible scenario:
- there are 4 nodes (1 -> 2 -> 3 -> 4)
- node 3 temporary lost connection with others
- node 2 considers 3 as failed, node failed event is fired for 3
- node 3 considers 4 as failed, adds 4 in nodeFailedList, then it restores 
connection with 1 and currently 1 will process nodeFailedList from 3 (even if 3 
is already considered as failed)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to