Semen Boikov created IGNITE-6700: ------------------------------------ Summary: Node considered as failed can cause failure of others nodes Key: IGNITE-6700 URL: https://issues.apache.org/jira/browse/IGNITE-6700 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: general Reporter: Semen Boikov Assignee: Semen Boikov Priority: Critical
Node considered as failed can cause failure of others nodes in cluster. There is an issue in TcpDiscoveryAbstractMessage.failedNodes processing, if message is received from node considered as failed, then failedNodes should be ignored. Possible scenario: - there are 4 nodes (1 -> 2 -> 3 -> 4) - node 3 temporary lost connection with others - node 2 considers 3 as failed, node failed event is fired for 3 - node 3 considers 4 as failed, adds 4 in nodeFailedList, then it restores connection with 1 and currently 1 will process nodeFailedList from 3 (even if 3 is already considered as failed) -- This message was sent by Atlassian JIRA (v6.4.14#64029)