From: Klaus Wenninger <kwenn...@redhat.com>
To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to open-source
clustering welcomed <users@clusterlabs.org>
Cc:
Date: 2021/4/9, Fri 21:12
Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control
fails.
On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote:
Hi Ken,
Hi All,
In the pgsql resource, crm_mon is executed in the process of demote and
stop, and the result is processed.
However, pacemaker included in RHEL8.4beta fails to execute this crm_mon.
- The problem also occurs on github
master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
The problem can be easily reproduced in the following ways.
Step1. Modify to execute crm_mon in the stop process of the Dummy resource.
----
dummy_stop() {
mon=$(crm_mon -1)
ret=$?
ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
dummy_monitor
if [ $? = $OCF_SUCCESS ]; then
rm ${OCF_RESKEY_state}
fi
return $OCF_SUCCESS
}
----
Step2. Configure a cluster with two nodes.
----
[root@rh84-beta01 ~]# crm_mon -rfA1
Cluster Summary:
* Stack: corosync
* Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition
with quorum
* Last updated: Thu Apr 8 18:00:52 2021
* Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on
rh84-beta01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ rh84-beta01 rh84-beta02 ]
Full List of Resources:
* dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01
Migration Summary:
----
Step3. Stop the node where the Dummy resource is running. The resource will
fail over.
----
[root@rh84-beta02 ~]# crm_mon -rfA1
Cluster Summary:
* Stack: corosync
* Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition
with quorum
* Last updated: Thu Apr 8 18:08:56 2021
* Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on
rh84-beta01
* 2 nodes configured
* 1 resource instance configured
Node List:
* Online: [ rh84-beta02 ]
* OFFLINE: [ rh84-beta01 ]
Full List of Resources:
* dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02
----
However, if you look at the log, you can see that the execution of crm_mon
in the stop processing of the Dummy resource has failed.
----
Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI ####
crm_mon[102] : Pacemaker daemons shutting down ...
Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output)
notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not
available on this node ]
Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?
Klaus
----
Similarly, pgsql also executes crm_mon with demote or stop, so control
fails.
The problem seems to be related to the next fix.
* Report pacemakerd in state waiting for sbd
- https://github.com/ClusterLabs/pacemaker/pull/2278
The problem does not occur with the release version of Pacemaker 2.0.5 or
the Pacemaker included with RHEL8.3.
This issue has a huge impact on the user.
Perhaps it also affects the control of other resources that utilize
crm_mon.
Please improve the release version of RHEL8.4 so that it includes Pacemaker
which does not cause this problem.
* Distributions other than RHEL may also be affected in future releases.
----
This content is the same as the following Bugzilla.
- https://bugs.clusterlabs.org/show_bug.cgi?id=5471
----
Best Regards,
Hideo Yamauchi.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/