BTW, you can try zookeeper discovery, I think it's the easier way to resolve split-brain problem: https://www.gridgain.com/docs/latest/developers-guide/clustering/zookeeper-discovery
пт, 11 сент. 2020 г. в 14:13, Michael Cherkasov <michael.cherka...@gmail.com >: > Make sure first you stop all nodes in one segment and only then start > them, rolling restart might not fix cluster segmentation. > > > пт, 11 сент. 2020 г. в 09:08, Denis Magda <dma...@apache.org>: > >> Hi Samuel, >> >> With the current behavior, the segments will not rejoin automatically. >> Once the network is recovered from a network partitioning event, you need >> to restart all the nodes of one of the segments. Those nodes will join the >> other nodes and the cluster will become fully operational. >> >> Let me know if you have any other questions or guidance with this. >> >> - >> Denis >> >> >> On Fri, Sep 11, 2020 at 7:38 AM Samuel Ueltschi < >> samuel.uelts...@bsi-software.com> wrote: >> >>> Hi >>> >>> >>> >>> I've been testing Ignite (2.8.1) and it's behaviour under network >>> segmentation. >>> >>> According to the docs, Ignite nodes should be able to detect network >>> segmentation and apply the configured SegmentationPolicy. >>> >>> >>> >>> However the segmentation handling didn't trigger as I would have >>> expected it to do. >>> >>> For my tests, I setup three cluster nodes c1, c2 and c3 running in >>> docker containers, all competing for a shared IgniteLock instance in a loop. >>> >>> Then I used iptables in container c2 to drop all incoming and outgoing >>> packages on that node. >>> >>> After a few seconds I got the following events: >>> >>> >>> >>> c1: >>> >>> - EVT_NODE_FAILED for c2 >>> >>> >>> >>> c2: >>> >>> - EVT_NODE_FAILED for c1 >>> >>> - EVT_NODE_FAILED for c3 >>> >>> >>> >>> c3: >>> >>> - EVT_NODE_FAILED for c2 >>> >>> >>> >>> Then I reset the iptables rules expecting that c2 would rejoin the >>> cluster and detect segmentation. >>> >>> However this didn't happen, c2 just keeps running as a second standalone >>> cluster instance. >>> >>> Only after restarting c2 it rejoined the cluster. >>> >>> >>> >>> Eventually I was able to trigger the EVT_NODE_SEGMENTED event by pausing >>> the c2 container for 1minute. After resuming, c2 detects the segmentation >>> and runs the segmentation policy as excepcted. >>> >>> >>> >>> Is this behaviour correct? Shouldn't the Ignite cluster be able to >>> recover from the first scenario? >>> >>> During a network segmentation no packages would be able to move between >>> nodes, so the iptables approach should be realistic in my oppinion. >>> >>> >>> >>> Maybe I have some wrong assumptions about network segmentation so any >>> feedback would be greatly appreciated. >>> >>> >>> >>> Cheers Sam >>> >>> >>> >>> -- >>> Software Engineer >>> BSI Business Systems Integration AG >>> Erlachstrasse 16B, CH-3012 Bern >>> Telefon +41 31 850 12 06 >>> >>> www.bsi-software.com >>> >>> >>> >>