Hi All, Sorry...
The format of the email has collapsed. I retransmit it later. Best Regards, Hideo Yamauchi. ----- Original Message ----- > From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp> > To: ClusterLabs-ML <users@clusterlabs.org> > Cc: > Date: 2016/10/5, Wed 23:25 > Subject: [ClusterLabs] [Problem] When a group resource does not stop in a > trouble node, the movement of the group resource is started in other nodes. > > Hi All, After Pacemaker1.1.14, there may be a problem in order of the stop of > the group > resource. The problem occurs by cluster constitution without STONITH. I can > confirm it in the next procedure. Step 1) Copy Dummy resource and make Dummy1 > resource and Dummy2 resource. > Step 2) Constitute a cluster. [root@rh72-01 ~]# crm_mon -1 -Af > Stack: corosync > Current DC: rh72-02 (version 1.1.15-e174ec8) - partition with quorum > Last updated: Wed Oct 5 16:24:21 2016 Last change: Wed Oct 5 > 16:24:15 2016 by root via cibadmin on rh72-01 2 nodes and 2 resources > configured > Online: [ rh72-01 rh72-02 ] Resource Group: grpDummy prmDummy1 > (ocf::pacemaker:Dummy1): Started rh72-01 prmDummy2 > (ocf::pacemaker:Dummy2): Started rh72-01 Node Attributes: > * Node rh72-01: > * Node rh72-02: Migration Summary: > * Node rh72-01: > * Node rh72-02: Step 3) Set pseudotrouble in stop of Dummy2. > (snip) > dummy_stop() { > return $OCF_ERR_GENERIC dummy_monitor if [ $? -eq $OCF_SUCCESS ]; then rm > ${OCF_RESKEY_state} fi rm -f "${VERIFY_SERIALIZED_FILE}" return > $OCF_SUCCESS > } > (snip) Step 4) Make rh72-01 node standby. Trouble occurs in Dummy2 resource, > and > the > resource does not move. [root@rh72-01 ~]# crm_standby -N rh72-01 -v on > [root@rh72-01 ~]# crm_mon -1 -Af > Stack: corosync > Current DC: rh72-02 (version 1.1.15-e174ec8) - partition with quorum > Last updated: Wed Oct 5 16:27:49 2016 Last change: Wed Oct 5 > 16:27:47 2016 by root via crm_attribute on rh72-01 2 nodes and 2 resources > configured Node rh72-01: standby > Online: [ rh72-02 ] Resource Group: grpDummy prmDummy1 > (ocf::pacemaker:Dummy1): Started rh72-01 prmDummy2 > (ocf::pacemaker:Dummy2): FAILED rh72-01 (blocked) Node Attributes: > * Node rh72-01: > * Node rh72-02: Migration Summary: > * Node rh72-01: prmDummy2: migration-threshold=1 fail-count=1000000 > last-failure='Wed Oct 5 > 16:29:29 2016' > * Node rh72-02: Failed Actions: > * prmDummy2_stop_0 on rh72-01 'unknown error' (1): call=15, > status=complete, > exitreason='none', last-rc-change='Wed Oct 5 16:27:47 2016', > queued=1ms, exec=34ms Step 5) Clean Dummy2 resource. [root@rh72-01 ~]# > crm_resource -C -r prmDummy2 -H rh72-01 -f > Cleaning up prmDummy2 on rh72-01, removing fail-count-prmDummy2 > Waiting for 1 replies from the CRMd. OK > [root@rh72-01 ~]# crm_mon -1 -Af > Stack: corosync > Current DC: rh72-02 (version 1.1.15-e174ec8) - partition with quorum > Last updated: Wed Oct 5 16:30:55 2016 Last change: Wed Oct 5 > 16:30:53 2016 by hacluster via crmd on rh72-01 2 nodes and 2 resources > configured Node rh72-01: standby > Online: [ rh72-02 ] Resource Group: grpDummy prmDummy1 > (ocf::pacemaker:Dummy1): Started rh72-02 prmDummy2 > (ocf::pacemaker:Dummy2): FAILED rh72-01 (blocked) Node Attributes: > * Node rh72-01: > * Node rh72-02: Migration Summary: > * Node rh72-01: prmDummy2: migration-threshold=1 fail-count=1000000 > last-failure='Wed Oct 5 > 16:32:35 2016' > * Node rh72-02: Failed Actions: > * prmDummy2_stop_0 on rh72-01 'unknown error' (1): call=23, > status=complete, > exitreason='none', last-rc-change='Wed Oct 5 16:30:54 2016', > queued=0ms, exec=35ms Trouble occurs again, and the Dummy2 resource does not > move, but the Dummy1 > resource moves to rh72-02 node. > If all the resources of the group do not stop, the resource should not move. > The > problem does not occur in Pacemaker1.1.13. An event of probe_complete is > abolished by Pacemaker1.1.14. > It is thought that a problem is included near the next correction. * > https://github.com/ClusterLabs/pacemaker/commit/c1438ae489d791cc689625332b8ced21bfd4d143#diff-8e7ae81c93497126538c2a82fe183692 > > * > https://github.com/ClusterLabs/pacemaker/commit/8f76b782133857b40a583e947d743d45c7d05dc8#diff-8e7ae81c93497126538c2a82fe183692 > > > > I registered this problem with Bugzilla. > * http://bugs.clusterlabs.org/show_bug.cgi?id=5301 > Best Regards, > Hideo Yamauch. > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org