Can someone please help me on this? On Thu, Dec 12, 2019 at 1:11 PM Akash Shinde <akashshi...@gmail.com> wrote:
> Hi, > > Can you please explain on high level how GridGain implementations protects > from having two segments that are alive at the same time which could lead > to data inconsistency over time? What exactly does it do to achieve this? > > Regards, > A. > > On Wed, Dec 11, 2019 at 5:48 PM Stanislav Lukyanov <stanlukya...@gmail.com> > wrote: > >> In Ignite a node can go into "segmented" state in two cases really: 1. A >> node was unavailable (sleeping. hanging in full GC, etc) for a long time 2. >> Cluster detected a possible split-brain situation and marked the node as >> "segmented". >> >> Yes, split-brain protection (in GridGain implementation and in theory >> too) doesn't protect your node from stopping. It protects you from having >> two segments that are alive at the same time which could lead to data >> inconsistency over time. >> >> Regarding Discovery and large clusters. If your cluster is too big for >> the ring-based TcpDiscoverySpi to work well then you should use Zookeeper >> Discovery which was created specifically to support large clusters. >> >> Stan >> >> On Mon, Dec 9, 2019 at 4:02 PM Prasad Bhalerao < >> prasadbhalerao1...@gmail.com> wrote: >> >>> >>> Can someone please advise on this? >>>> >>>> ---------- Forwarded message --------- >>>> From: Prasad Bhalerao <prasadbhalerao1...@gmail.com> >>>> Date: Fri, Nov 29, 2019 at 7:53 AM >>>> Subject: Re: Local node terminated after segmentation >>>> To: <user@ignite.apache.org> >>>> >>>> >>>> I had checked the resource you mentioned, but I was confused with >>>> grid-gain doc describing it as protection against split-brain. Because if >>>> the node is segmented the only thing one can do is stop/restart/noop. >>>> I was just wondering how it provides protection against split-brain. >>>> Now I think by protection it means kill the segmented node/nodes or >>>> restart it and bring it back in the cluster . >>>> >>>> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the >>>> ring right to check if the node is reachable or not. >>>> So the question in what situation one needs one more ways to check if >>>> the node is reachable or not using different resolvers? >>>> >>>> Please let me know if my understanding is correct. >>>> >>>> The article you mentioned, I had checked that code. It requires a node >>>> to be configured in advance so that resolver can check if that node is >>>> reachable from local host. It doesn't not check if all the nodes are >>>> reachable from local host. >>>> >>>> Eg: node1 will check for node2 and node2 will check for node 3 and node >>>> 3 will check for node1 to complete the ring >>>> Just wondering how to configure this plugin in prod env with large >>>> cluster. >>>> I tried to check grid-gain doc to see if they have provided any sample >>>> code to configure their plugins just to get an idea but did not find any. >>>> >>>> Can you please advise? >>>> >>>> >>>> Thanks, >>>> Prasad >>>> >>>> On Thu 28 Nov, 2019, 11:41 PM akurbanov <antkr....@gmail.com wrote: >>>> >>>>> Hello, >>>>> >>>>> Basically this is a mechanism to implement custom logical/network >>>>> split-brain protection. Segmentation resolvers allow you to implement >>>>> a way >>>>> to determine if node has to be segmented/stopped/etc in method >>>>> isValidSegment() and possibly use different combinations of resolvers >>>>> within >>>>> processor. >>>>> >>>>> If you want to check out how it could be done, some articles/source >>>>> samples >>>>> that might give you a good insight may be easily found on the web, >>>>> like: >>>>> >>>>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239 >>>>> >>>>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html >>>>> >>>>> 2-3 are described in the documentation, copying the link just to point >>>>> out >>>>> which one: >>>>> https://apacheignite.readme.io/docs/critical-failures-handling >>>>> >>>>> By default answer to 2 is: Ignite doesn't ignote node FailureType >>>>> SEGMENTATION and calls the failure handler in this case. Actions that >>>>> are >>>>> taken are defined in failure handler. >>>>> >>>>> AbstractFailureHandler class has only SYSTEM_WORKER_BLOCKED and >>>>> SYSTEM_CRITICAL_OPERATION_TIMEOUT ignored by default. However, you >>>>> might >>>>> override the failure handler and call .setIgnoredFailureTypes(). >>>>> >>>>> Links: >>>>> Extend this class: >>>>> >>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/failure/AbstractFailureHandler.java >>>>> — check for custom implementations used in Ignite tests and how they >>>>> are >>>>> used. >>>>> >>>>> Sample from tests: >>>>> >>>>> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/failure/SystemWorkersBlockingTest.java >>>>> >>>>> Failure processor: >>>>> >>>>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/failure/FailureProcessor.java >>>>> >>>>> Best regards, >>>>> Anton >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>>>> >>>>