On 29/04/18 13:22, Andrei Borzenkov wrote: > 29.04.2018 04:19, Wei Shan пишет: >> Hi, >> >> I'm using Redhat Cluster Suite 7with watchdog timer based fence agent. I >> understand this is a really bad setup but this is what the end-user wants. >> >> ATB => auto_tie_breaker >> >> "When the auto_tie_breaker is used in even-number member clusters, then the >> failure of the partition containing the auto_tie_breaker_node (by default >> the node with lowest ID) will cause other partition to become inquorate and >> it will self-fence. In 2-node clusters with auto_tie_breaker this means >> that failure of node favoured by auto_tie_breaker_node (typically nodeid 1) >> will result in reboot of other node (typically nodeid 2) that detects the >> inquorate state. If this is undesirable then corosync-qdevice can be used >> instead of the auto_tie_breaker to provide additional vote to quorum making >> behaviour closer to odd-number member clusters." >> > > That's not what upstream corosync manual pages says. Corosync itself > won't initiate self-fencing, it just marks node as being out of quorum. > What happens later depends on higher layers like pacemaker. Pacemaker > can be configured to commit suicide, but can also be configured to > ignore quorum completely. I am not familiar with details how RHCS > behaves by default. > > I just tested on vanilla corosync+pacemaker (openSUSE Tumbleweed) and > nothing happens when I kill lowest node in two-node configuration. >
That is the expected behaviour for a 2 node ATB cluster. If the preferred node is not available then the remaining node will stall until it comes back again. It sound odd, but that's what happens. A preferred node is a preferred node. If it can move from one to the other when it fails then it's not a preferred node ... it's just a node :) If you need full resilient failover for 2 nodes then qdevice is more likely what you need. Chrissie > If your cluster nodes are configured to commit suicide, what happens > after reboot depends on at least wait_for_all corosync setting. With > wait_for_all=1 (default in two_node) and without a) ignoring quorum > state and b) having fencing resource pacemaker on your node will wait > indefinitely after reboot because partner is not available. > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org