On 29.10.2021 18:16, Andrei Borzenkov wrote: > On 29.10.2021 17:53, Ken Gaillot wrote: >> On Fri, 2021-10-29 at 13:59 +0000, Gerry R Sommerville wrote: >>> Hey Andrei, >>> >>> Thanks for your response again. The cluster nodes and remote hosts >>> each share two networks, however there is no routing between them. I >>> don't suppose there is a configuration parameter we can set to tell >>> Pacemaker to try communicating with the remotes using multiple IP >>> addresses? >>> >>> Gerry Sommerville >>> E-mail: ge...@ca.ibm.com >> >> Hi, >> >> No, but you can use bonding if you want to have interface redundancy >> for a remote connection. To be clear, there is no requirement that >> remote nodes and cluster nodes have the same level of redundancy, it's >> just a design choice. >> >> To address the original question, this is the log sequence I find most >> relevant: >> >>> Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] >>> (unpack_rsc_op_failure) warning: Unexpected result (error) was >>> recorded for monitor of jangcluster-srv-4 on jangcluster-srv-2 at Oct >>> 22 12:21:09 2021 | rc=1 id=jangcluster-srv-4_last_failure_0 >> >>> Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[776553] >>> (unpack_rsc_op_failure) notice: jangcluster-srv-4 will not be >>> started under current conditions >> >>> Oct 22 12:21:09.389 jangcluster-srv-2 pacemaker-schedulerd[ >>> 776553] (pe_fence_node) warning: Remote node jangcluster-srv-4 >>> will be fenced: remote connection is unrecoverable >> >> The "will not be started" is why the node had to be fenced. There was > > OK so it implies that remote resource should fail over if connection to > remote node fails. Thank you, that was not exactly clear from documentation. > >> nowhere to recover the connection. I'd need to see the CIB from that >> time to know why; it's possible you had an old constraint banning the >> connection from the other node (e.g. from a ban or move command), or >> something like that. >> > > Hmm ... looking in (current) sources it seems this message is emitted > only in case of on-fail=stop operation property ... >
Well ... /* For remote nodes, ensure that any failure that results in dropping an * active connection to the node results in fencing of the node. * * There are only two action failures that don't result in fencing. * 1. probes - probe failures are expected. * 2. start - a start failure indicates that an active connection does not already * exist. The user can set op on-fail=fence if they really want to fence start * failures. */ pacemaker will forcibly set on-fail=stop for remote resource. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/