On 6/29/20 9:56 AM, Klaus Wenninger wrote: > On 6/24/20 8:09 AM, Andrei Borzenkov wrote: >> Two node is what I almost exclusively deal with. It works reasonably >> well in one location where failures to perform fencing are rare and can >> be mitigated by two different fencing methods. Usually SBD is reliable >> enough, as failure of shared storage also implies failure of the whole >> cluster. >> >> When two nodes are located on separate sites (not necessary >> Asia/America, two buildings across the street is already enough) we have >> issue of complete site isolation where normal fencing becomes impossible >> together with missing node (power outage, network outage etc). >> >> Usual recommendation is third site which functions as witness. This >> works fine up to failure of this third site itself. Unavailability of >> the witness makes normal maintenance of either of two nodes impossible. >> If witness is not available and (pacemaker on) one of two nodes needs to >> be restarted the remaining node goes out of quorum or commits suicide. >> At most we can statically designate one node as tiebreaker (and this is >> already incompatible with qdevice). >> >> I think I finally can formulate what I miss. The behavior that I would >> really want is >> >> - if (pacemaker on) one node performs normal shutdown, remaining node >> continues managing services, independently of witness state or >> availability. Usually this is achieved either by two_node or by >> no-quorum-policy=ignore, but that absolutely requires successful >> fencing, so cannot be used alone. Such feature likely mandates WFA, but >> that is probably unavoidable. >> >> - if other node is lost unexpectedly, first try normal fencing between >> two nodes, independently of witness state or availability. If fencing >> succeeds, we can continue managing services. >> >> - if normal fencing fails (due to other site isolation), consult witness >> - and follow normal procedure. If witness is not available/does not >> grant us quorum - suicide/go out of quorum, if witness is available and >> grants us quorum - continue managing services. >> >> Any potential issues with this? If it is possible to implement using >> current tools I did not find it. > I see the idea but I see a couple of issues: My mailer was confused by all this combinations of "Antw: Re: Antw:" anddidn't compose mails into a thread properly. Which is why I missed further discussion where it was definitely still about shared-storage and notwatchdog fencing. Had guessed - from the initial post - that there was a shift in direction of qdevice. But maybe thoughts below are still interesting in that light ... > > > - watchdog-fencing is timing critical. So when loosing quorum > we haveto suicide after a defined time. So just trying normal > fencing first andthen going for watchdog-fencing is no way. > But what could be consideredis right away starting with > watchdog fencing upon quorum-loss - rememberI said defined > which doesn't necessarily mean short - and try other means > of fencing in parallel and if that succeeds e.g. somehow > regain quorum(additional voting or something one would have > to think over a little more). > > - usually we are using quorum to prevent a fence race which > this approachjeopardizes. Of course we can introduce an > additional wait before normalfencing on the node that > doesn't have quorum to mitigate that effect. > > - why I think current configuration possibilities won't give > you yourdesired behavior is that we finally end up with > 2 quorum sources. > Only case where I'm aware of a similar thing is > 2-node + shared diskwhere sbd decides not to go with quorum > gotten from pacemaker butdoes node-counting internally > instead. > > Klaus >> And note, that this is not actually limited to two node cluster - we >> have more or less the same issue with any 50-50 split cluster and >> witness on third site. >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
-- Klaus Wenninger Senior Software Engineer, EMEA ENG Base Operating Systems Red Hat kwenn...@redhat.com Red Hat GmbH, http://www.de.redhat.com/, Sitz: Grasbrunn, Handelsregister: Amtsgericht München, HRB 153243, Geschäftsführer: Charles Cachera, Laurie Krebs, Michael O'Neill, Thomas Savage _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/