Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Klaus Wenninger Fri, 09 Apr 2021 06:46:10 -0700

On 4/9/21 3:36 PM, Klaus Wenninger wrote:

On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote:

Hi Klaus,
Thanks for your comment.
Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?
Selinux is not enabled.
Isn't crm_mon caused by not returning a response when pacemakerdprepares to stop?

yep ... that doesn't look good.
While in pcmk_shutdown_worker ipc isn't handled.
Question is why that didn't create issue earlier.
Probably I didn't test with resources that had crm_mon in
their stop/monitor-actions but sbd should have run into
issues.

Klaus

But when shutting down a node the resources should be
shutdown before pacemakerd goes down.
But let me have a look if it can happen that pacemakerd
doesn't react to the ipc-pings before. That btw. might be
lethal for sbd-scenarios (if the phase is too long and it
migh actually not be defined).

My idea with selinux would have been that it might block
the ipc if crm_mon is issued by execd. But well forget
about it as it is not enabled ;-)


Klaus

pgsql needs the result of crm_mon in demote processing and stopprocessing.crm_mon should return a response even after pacemakerd goes into astop operation.


Best Regards,
Hideo Yamauchi.


----- Original Message -----

From: Klaus Wenninger <kwenn...@redhat.com>

To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related toopen-source clustering welcomed <users@clusterlabs.org>

Cc:
Date: 2021/4/9, Fri 21:12

Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resourcecontrol fails.


On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote:

  Hi Ken,
  Hi All,
In the pgsql resource, crm_mon is executed in the process ofdemote and

stop, and the result is processed.

However, pacemaker included in RHEL8.4beta fails to execute thiscrm_mon.
    - The problem also occurs on github

master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).

  The problem can be easily reproduced in the following ways.

Step1. Modify to execute crm_mon in the stop process of the Dummyresource.

  ----

  dummy_stop() {
       mon=$(crm_mon -1)
       ret=$?
       ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
       dummy_monitor
       if [ $? =  $OCF_SUCCESS ]; then
           rm ${OCF_RESKEY_state}
       fi
       return $OCF_SUCCESS
  }
  ----

  Step2. Configure a cluster with two nodes.
  ----

  [root@rh84-beta01 ~]# crm_mon -rfA1
  Cluster Summary:
     * Stack: corosync

* Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) -partition

with quorum

     * Last updated: Thu Apr  8 18:00:52 2021
     * Last change:  Thu Apr  8 18:00:38 2021 by root via cibadmin on

rh84-beta01

     * 2 nodes configured
     * 1 resource instance configured

  Node List:
     * Online: [ rh84-beta01 rh84-beta02 ]

  Full List of Resources:
     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta01

  Migration Summary:
  ----

Step3. Stop the node where the Dummy resource is running. Theresource will

fail over.

  ----
  [root@rh84-beta02 ~]# crm_mon -rfA1
  Cluster Summary:
     * Stack: corosync
* Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) -partition

with quorum

     * Last updated: Thu Apr  8 18:08:56 2021
     * Last change:  Thu Apr  8 18:05:08 2021 by root via cibadmin on

rh84-beta01

     * 2 nodes configured
     * 1 resource instance configured

  Node List:
     * Online: [ rh84-beta02 ]
     * OFFLINE: [ rh84-beta01 ]

  Full List of Resources:
     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta02
  ----

However, if you look at the log, you can see that the executionof crm_mon

in the stop processing of the Dummy resource has failed.

  ----
  Apr 08 18:05:17  Dummy(dummy-1)[2631]:    INFO: ### YAMAUCHI ####

crm_mon[102] : Pacemaker daemons shutting down ...

Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219](log_op_output)

notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: clusteris not

available on this node ]
Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?

Klaus

  ----
Similarly, pgsql also executes crm_mon with demote or stop, socontrol

fails.

  The problem seems to be related to the next fix.
    * Report pacemakerd in state waiting for sbd
     - https://github.com/ClusterLabs/pacemaker/pull/2278

The problem does not occur with the release version of Pacemaker2.0.5 or

the Pacemaker included with RHEL8.3.

  This issue has a huge impact on the user.

  Perhaps it also affects the control of other resources that utilize

crm_mon.

Please improve the release version of RHEL8.4 so that it includesPacemaker

which does not cause this problem.

* Distributions other than RHEL may also be affected in futurereleases.


  ----
  This content is the same as the following Bugzilla.
    - https://bugs.clusterlabs.org/show_bug.cgi?id=5471
  ----

  Best Regards,
  Hideo Yamauchi.

  _______________________________________________
  Manage your subscription:
  https://lists.clusterlabs.org/mailman/listinfo/users

  ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Reply via email to