In Ignite a node can go into "segmented" state in two cases really: 1. A node was unavailable (sleeping. hanging in full GC, etc) for a long time 2. Cluster detected a possible split-brain situation and marked the node as "segmented".
Yes, split-brain protection (in GridGain implementation and in theory too) doesn't protect your node from stopping. It protects you from having two segments that are alive at the same time which could lead to data inconsistency over time. Regarding Discovery and large clusters. If your cluster is too big for the ring-based TcpDiscoverySpi to work well then you should use Zookeeper Discovery which was created specifically to support large clusters. Stan On Mon, Dec 9, 2019 at 4:02 PM Prasad Bhalerao <prasadbhalerao1...@gmail.com> wrote: > > Can someone please advise on this? >> >> ---------- Forwarded message --------- >> From: Prasad Bhalerao <prasadbhalerao1...@gmail.com> >> Date: Fri, Nov 29, 2019 at 7:53 AM >> Subject: Re: Local node terminated after segmentation >> To: <user@ignite.apache.org> >> >> >> I had checked the resource you mentioned, but I was confused with >> grid-gain doc describing it as protection against split-brain. Because if >> the node is segmented the only thing one can do is stop/restart/noop. >> I was just wondering how it provides protection against split-brain. >> Now I think by protection it means kill the segmented node/nodes or >> restart it and bring it back in the cluster . >> >> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the ring >> right to check if the node is reachable or not. >> So the question in what situation one needs one more ways to check if the >> node is reachable or not using different resolvers? >> >> Please let me know if my understanding is correct. >> >> The article you mentioned, I had checked that code. It requires a node to >> be configured in advance so that resolver can check if that node is >> reachable from local host. It doesn't not check if all the nodes are >> reachable from local host. >> >> Eg: node1 will check for node2 and node2 will check for node 3 and node 3 >> will check for node1 to complete the ring >> Just wondering how to configure this plugin in prod env with large >> cluster. >> I tried to check grid-gain doc to see if they have provided any sample >> code to configure their plugins just to get an idea but did not find any. >> >> Can you please advise? >> >> >> Thanks, >> Prasad >> >> On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr....@gmail.com wrote: >> >>> Hello, >>> >>> Basically this is a mechanism to implement custom logical/network >>> split-brain protection. Segmentation resolvers allow you to implement a >>> way >>> to determine if node has to be segmented/stopped/etc in method >>> isValidSegment() and possibly use different combinations of resolvers >>> within >>> processor. >>> >>> If you want to check out how it could be done, some articles/source >>> samples >>> that might give you a good insight may be easily found on the web, like: >>> >>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239 >>> >>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html >>> >>> 2-3 are described in the documentation, copying the link just to point >>> out >>> which one: >>> https://apacheignite.readme.io/docs/critical-failures-handling >>> >>> By default answer to 2 is: Ignite doesn't ignote node FailureType >>> SEGMENTATION and calls the failure handler in this case. Actions that are >>> taken are defined in failure handler. >>> >>> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and >>> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you might >>> override the failure handler and call .setIgnoredFailureTypes(). >>> >>> Links: >>> Extend this class: >>> >>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java >>> — check for custom implementations used in Ignite tests and how they are >>> used. >>> >>> Sample from tests: >>> >>> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java >>> >>> Failure processor: >>> >>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java >>> >>> Best regards, >>> Anton >>> >>> >>> >>> >>> >>> -- >>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>> >>