On Mon, Sep 4, 2023 at 4:44 PM David Dolan <daithido...@gmail.com> wrote: > > Thanks Klaus\Andrei, > > So if I understand correctly what I'm trying probably shouldn't work.
It is impossible to configure corosync (or any other cluster system for that matter) to keep the *arbitrary* last node quorate. It is possible to designate one node as "preferred" and to keep it quorate. Returning to your example: > I tried adding this line to corosync.conf and I could then bring down the > services on node 1 and 2 or node 2 and 3 but if I left node 2 until last, the > cluster failed > auto_tie_breaker_node: 1 3 > Correct. In your scenario the tie breaker is only relevant with two nodes. When the first node is down, the remaining two nodes select the tiebreaker. It can only be node 1 or 3. > This line had the same outcome as using 1 3 > auto_tie_breaker_node: 1 2 3 If it really has the same outcome (i.e. cluster fails when node 2 is left) it is a bug. This line makes nodes 1 or 2 a possible tiebreaker. So the cluster must fail if node 3 is left, not node 2. What most certainly *is* possible - no-quorum-policy=ignore + reliable fencing. This worked just fine in two node clusters without two_node. It does not make the last node quorate, but it allows pacemaker to continue providing services on this node *and* taking over services from other nodes if they were fenced successfully. > And I should attempt setting auto_tie_breaker in corosync and remove > last_man_standing. > Then, I should set up another server with qdevice and configure that using > the LMS algorithm. > > Thanks > David > > On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger <kwenn...@redhat.com> wrote: >> >> >> >> On Mon, Sep 4, 2023 at 1:50 PM Andrei Borzenkov <arvidj...@gmail.com> wrote: >>> >>> On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger <kwenn...@redhat.com> wrote: >>> > >>> > >>> > >>> > On Mon, Sep 4, 2023 at 12:45 PM David Dolan <daithido...@gmail.com> wrote: >>> >> >>> >> Hi Klaus, >>> >> >>> >> With default quorum options I've performed the following on my 3 node >>> >> cluster >>> >> >>> >> Bring down cluster services on one node - the running services migrate >>> >> to another node >>> >> Wait 3 minutes >>> >> Bring down cluster services on one of the two remaining nodes - the >>> >> surviving node in the cluster is then fenced >>> >> >>> >> Instead of the surviving node being fenced, I hoped that the services >>> >> would migrate and run on that remaining node. >>> >> >>> >> Just looking for confirmation that my understanding is ok and if I'm >>> >> missing something? >>> > >>> > >>> > As said I've never used it ... >>> > Well when down to 2 nodes LMS per definition is getting into trouble as >>> > after another >>> > outage any of them is gonna be alone. In case of an ordered shutdown this >>> > could >>> > possibly be circumvented though. So I guess your fist attempt to enable >>> > auto-tie-breaker >>> > was the right idea. Like this you will have further service at least on >>> > one of the nodes. >>> > So I guess what you were seeing is the right - and unfortunately only >>> > possible - behavior. >>> >>> I still do not see where fencing comes from. Pacemaker requests >>> fencing of the missing nodes. It also may request self-fencing, but >>> not in the default settings. It is rather hard to tell what happens >>> without logs from the last remaining node. >>> >>> That said, the default action is to stop all resources, so the end >>> result is not very different :) >> >> >> But you are of course right. The expected behaviour would be that >> the leftover node stops the resources. >> But maybe we're missing something here. Hard to tell without >> the exact configuration including fencing. >> Again, as already said, I don't know anything about the LMS >> implementation with corosync. In theory there were both arguments >> to either suicide (but that would have to be done by pacemaker) or >> to automatically switch to some 2-node-mode once the remaining >> partition is reduced to just 2 followed by a fence-race (when done >> without the precautions otherwise used for 2-node-clusters). >> But I guess in this case it is none of those 2. >> >> Klaus >>> >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/