>>> Eric Robinson <eric.robin...@psmnv.com> schrieb am 09.10.2016 um 22:33 in Nachricht <dm5pr03mb2729630c1a1453e4eb61ed8afa...@dm5pr03mb2729.namprd03.prod.outlook.com>
> I've been working on a script for preventing split-brain in 2-node clusters > and > I would appreciate comments from everyone. If someone already has a solution > like this, let me know! Hi! I'd try to prevent working on the wrong problem: If you have FC in addition to LAN (assuming you don't do FCoverIP on those LANs) I'd strongly suggest to use SBD and fencing, or using a third "whitness node" for quorum. It's not obvious to me what problem you are really trying to solve. > > Most of my database clusters are 2-nodes, with each node in a geographically > separate data center. Our layout looks like the following diagram. Each > server node has three physical connections to the world. LANs A, B , C, D are > all physically separate cable plants and cross-connects between the data > centers (using different switches, routers, power, fiber paths, etc.). This > is to ensure maximum cluster communication intelligence. LANs A and B > (Corosync ring 0) are bonded at the NICs, as are LANs C and D (Corosync ring > 1). > > Hopefully this diagram will come through intact... > > > > +----------------+ > | | > | Third party | > | Web Hosting | > +---+--------+---+ > | | > | | > | | > | | > | | > | | > ++XX | > XXX XXXXXX+-+XXX > XX XX XXX > XXXXXXX XX > XXXX XX X > X XXX > +--------+ The Interwebs XXX+-----+ > | XXX X | > | XX XX | > | X XX | > | X XXXX XXXXXXXXXXX | > | XXXXXX XX XX | > | XXXXXXX | > | | > | Internet | Internet > | | > | | > | | > | LAN A | > | +-----------------------------------+ | > | | LAN B | | > | | +---------------------------+ | | > | | | | | | > +---+---+---+----+ +-----+---+---+--+ > | | | | > | Node 1 | | Node 2 | > | | | | > +------+---+-----+ +-----+---+------+ > | | LAN C | | > | +----------------------------+ | > | LAN D | > +------------------------------------+ > > > > Even with all that connectivity it is possible that something could happen > to interrupt communication between the 2 data centers, or the connectivity > been 1 of the data centers and the Internet, and split brain would result. I > have been working on a way to prevent this using a concept I call a "dead > drop." This idea takes its name from the spy world, where spies cannot > communicate directly, but they are able to pass simple information and status > messages to each other by using a blind drop in a previously agreed location. > Spy X makes a mark on a tree. Later, spy Y comes by and sees the mark, and > knows that spy X is okay. He leaves a mark of his own on the tree, and later > spy X sees it and knows that spy Y is okay. Neither spy owns the tree or the > land it is on. > > The same idea applies here. Suppose all direct TCP/IP connectivity were to > be severed between Nodes 1 and 2, but both of them are still able to reach > the Internet. Normally, split brain would result. But SUPPOSE they were both > running scripts that use curl requests to post and retrieve simple status > messages to and from a third party web host. In other words, even though the > nodes cannot talk to each other directly, they can still leave messages at a > dead drop location for each other to read. If Node 2 was in standby mode, > normally it would switch to primary. However, if it checks the dead drop and > sees a message from Node 1 that says, "I'm still okay and communicating with > customers." Then Node 2 knows not to become cluster primary. This script > could possibly be implemented as a cluster resource, with most other > resources dependent on it. > > The dead drop needs no intelligence other than the ability to read and write > simple text files, and it can run on any third-party web host (or on multiple > web sites). It does not fill the role of a quorum or arbitrator. The 2 Nodes > themselves remain in control of their own failover decisions. > > I'm SURE this has been attempted already and I don't want to re-invent the > wheel, but I have not seen this approach anywhere. Maybe there's a good > reason for that because it simply won't work? The arbitration solutions I > have seen all rely on a third machine that plays a complex role in > arbitration. > > Thoughts? > > -- > Eric Robinson _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org