On 2018-07-09 11:45 AM, Klaus Wenninger wrote: > On 07/09/2018 05:33 PM, Digimer wrote: >> On 2018-07-09 09:56 AM, Klaus Wenninger wrote: >>> On 07/09/2018 03:49 PM, Digimer wrote: >>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote: >>>>> On 07/09/2018 02:04 PM, Confidential Company wrote: >>>>>> Hi, >>>>>> >>>>>> Any ideas what triggers fencing script or stonith? >>>>>> >>>>>> Given the setup below: >>>>>> 1. I have two nodes >>>>>> 2. Configured fencing on both nodes >>>>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and >>>>>> fence2(for Node2) respectively >>>>>> >>>>>> *What does it mean to configured delay in stonith? wait for 15 seconds >>>>>> before it fence the node? >>>>> Given that on a 2-node-cluster you don't have real quorum to make one >>>>> partial cluster fence the rest of the nodes the different delays are meant >>>>> to prevent a fencing-race. >>>>> Without different delays that would lead to both nodes fencing each >>>>> other at the same time - finally both being down. >>>> Not true, the faster node will kill the slower node first. It is >>>> possible that through misconfiguration, both could die, but it's rare >>>> and easily avoided with a 'delay="15"' set on the fence config for the >>>> node you want to win. >>> What exactly is not true? Aren't we saying the same? >>> Of course one of the delays can be 0 (most important is that >>> they are different). >> Perhaps I misunderstood your message. It seemed to me that the >> implication was that fencing in 2-node without a delay always ends up >> with both nodes being down, which isn't the case. It can happen if the >> fence methods are not setup right (ie: the node isn't set to immediately >> power off on ACPI power button event). > Yes, a misunderstanding I guess. > > Should have been more verbose in saying that due to the > time between the fencing-command fired off to the fencing > device and the actual fencing taking place (as you state > dependent on how it is configured in detail - but a measurable > time in all cases) there is a certain probability that when > both nodes start fencing at roughly the same time we will > end up with 2 nodes down. > > Everybody has to find his own tradeoff between reliability > fence-races are prevented and fencing delay I guess.
We've used this; 1. IPMI (with the guest OS set to immediately power off) as primary, with a 15 second delay on the active node. 2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing for when IPMI fails, with no delay. In ~8 years, across dozens and dozens of clusters and countless fence actions, we've never had a dual-fence event (where both nodes go down). So it can be done safely, but as always, test test test before prod. >> If the delay is set on both nodes, and they are different, it will work >> fine. The reason not to do this is that if you use 0, then don't use >> anything at all (0 is default), and any other value causes avoidable >> fence delays. >> >>>> Don't use a delay on the other node, just the node you want to live in >>>> such a case. >>>> >>>>>> *Given Node1 is active and Node2 goes down, does it mean fence1 will >>>>>> first execute and shutdowns Node1 even though Node2 goes down? >>>>> If Node2 managed to sign off properly it will not. >>>>> If network-connection is down so that Node2 can't inform Node1 that it >>>>> is going >>>>> down and finally has stopped all resources it will be fenced by Node1. >>>>> >>>>> Regards, >>>>> Klaus >>>> Fencing occurs in two cases; >>>> >>>> 1. The node stops responding (meaning it's in an unknown state, so it is >>>> fenced to force it into a known state). >>>> 2. A resource / service fails to stop stop. In this case, the service is >>>> in an unknown state, so the node is fenced to force the service into a >>>> known state so that it can be safely recovered on the peer. >>>> >>>> Graceful withdrawal of the node from the cluster, and graceful stopping >>>> of services will not lead to a fence (because in both cases, the node / >>>> service are in a known state - off). >>>> >> >> -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org