11.07.2018 05:45, Confidential Company пишет: > Not true, the faster node will kill the slower node first. It is > possible that through misconfiguration, both could die, but it's rare > and easily avoided with a 'delay="15"' set on the fence config for the > node you want to win. > > Don't use a delay on the other node, just the node you want to live in > such a case. > > ** > 1. Given Active/Passive setup, resources are active on Node1 > 2. fence1(prefers to Node1, delay=15) and fence2(prefers to > Node2, delay=30) > 3. Node2 goes down > 4. Node1 thinks Node2 goes down / Node2 thinks Node1 goes > down
If node2 is down, it cannot think anything. > 5. fence1 counts 15 seconds before he fence Node1 while > fence2 counts 30 seconds before he fence Node2 > 6. Since fence1 do have shorter time than fence2, fence1 > executes and shutdown Node1. > 7. fence1(action: shutdown Node1) will trigger first > always because it has shorter delay than fence2. > > ** Okay what's important is that they should be different. But in the case > above, even though Node2 goes down but Node1 has shorter delay, Node1 gets > fenced/shutdown. This is a sample scenario. I don't get the point. Can you > comment on this? > > Thanks > > On Tue, Jul 10, 2018 at 12:18 AM, Klaus Wenninger <[email protected]> > wrote: > >> On 07/09/2018 05:53 PM, Digimer wrote: >>> On 2018-07-09 11:45 AM, Klaus Wenninger wrote: >>>> On 07/09/2018 05:33 PM, Digimer wrote: >>>>> On 2018-07-09 09:56 AM, Klaus Wenninger wrote: >>>>>> On 07/09/2018 03:49 PM, Digimer wrote: >>>>>>> On 2018-07-09 08:31 AM, Klaus Wenninger wrote: >>>>>>>> On 07/09/2018 02:04 PM, Confidential Company wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Any ideas what triggers fencing script or stonith? >>>>>>>>> >>>>>>>>> Given the setup below: >>>>>>>>> 1. I have two nodes >>>>>>>>> 2. Configured fencing on both nodes >>>>>>>>> 3. Configured delay=15 and delay=30 on fence1(for Node1) and >>>>>>>>> fence2(for Node2) respectively >>>>>>>>> >>>>>>>>> *What does it mean to configured delay in stonith? wait for 15 >> seconds >>>>>>>>> before it fence the node? >>>>>>>> Given that on a 2-node-cluster you don't have real quorum to make >> one >>>>>>>> partial cluster fence the rest of the nodes the different delays >> are meant >>>>>>>> to prevent a fencing-race. >>>>>>>> Without different delays that would lead to both nodes fencing each >>>>>>>> other at the same time - finally both being down. >>>>>>> Not true, the faster node will kill the slower node first. It is >>>>>>> possible that through misconfiguration, both could die, but it's rare >>>>>>> and easily avoided with a 'delay="15"' set on the fence config for >> the >>>>>>> node you want to win. >>>>>> What exactly is not true? Aren't we saying the same? >>>>>> Of course one of the delays can be 0 (most important is that >>>>>> they are different). >>>>> Perhaps I misunderstood your message. It seemed to me that the >>>>> implication was that fencing in 2-node without a delay always ends up >>>>> with both nodes being down, which isn't the case. It can happen if the >>>>> fence methods are not setup right (ie: the node isn't set to >> immediately >>>>> power off on ACPI power button event). >>>> Yes, a misunderstanding I guess. >>>> >>>> Should have been more verbose in saying that due to the >>>> time between the fencing-command fired off to the fencing >>>> device and the actual fencing taking place (as you state >>>> dependent on how it is configured in detail - but a measurable >>>> time in all cases) there is a certain probability that when >>>> both nodes start fencing at roughly the same time we will >>>> end up with 2 nodes down. >>>> >>>> Everybody has to find his own tradeoff between reliability >>>> fence-races are prevented and fencing delay I guess. >>> We've used this; >>> >>> 1. IPMI (with the guest OS set to immediately power off) as primary, >>> with a 15 second delay on the active node. >>> >>> 2. Two Switched PDUs (two power circuits, two PSUs) as backup fencing >>> for when IPMI fails, with no delay. >>> >>> In ~8 years, across dozens and dozens of clusters and countless fence >>> actions, we've never had a dual-fence event (where both nodes go down). >>> So it can be done safely, but as always, test test test before prod. >> >> No doubt about that this setup is working reliably. >> You just have to know your fencing-devices and >> which delays they involve. >> >> If we are talking about SBD (with disk as otherwise >> it doesn't work in a sensible way in 2-node-clusters) >> for instance I would strongly advise using a delay. >> >> So I guess it is important to understand the basic >> idea behind this different delay-based fence-race >> avoidance. >> Afterwards you can still decide why it is no issue >> in your own setup. >> >>> >>>>> If the delay is set on both nodes, and they are different, it will work >>>>> fine. The reason not to do this is that if you use 0, then don't use >>>>> anything at all (0 is default), and any other value causes avoidable >>>>> fence delays. >>>>> >>>>>>> Don't use a delay on the other node, just the node you want to live >> in >>>>>>> such a case. >>>>>>> >>>>>>>>> *Given Node1 is active and Node2 goes down, does it mean fence1 >> will >>>>>>>>> first execute and shutdowns Node1 even though Node2 goes down? >>>>>>>> If Node2 managed to sign off properly it will not. >>>>>>>> If network-connection is down so that Node2 can't inform Node1 that >> it >>>>>>>> is going >>>>>>>> down and finally has stopped all resources it will be fenced by >> Node1. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Klaus >>>>>>> Fencing occurs in two cases; >>>>>>> >>>>>>> 1. The node stops responding (meaning it's in an unknown state, so >> it is >>>>>>> fenced to force it into a known state). >>>>>>> 2. A resource / service fails to stop stop. In this case, the >> service is >>>>>>> in an unknown state, so the node is fenced to force the service into >> a >>>>>>> known state so that it can be safely recovered on the peer. >>>>>>> >>>>>>> Graceful withdrawal of the node from the cluster, and graceful >> stopping >>>>>>> of services will not lead to a fence (because in both cases, the >> node / >>>>>>> service are in a known state - off). >>>>>>> >>>>> >>> >> >> > > > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
