See the file attached. This one has been produced and tested with pacemaker 1.1.16 (RHEL 7).


--

Pavel


08.12.2020 10:14, Reid Wahl :
Can you provide the state4.xml file that you're using? I'm unable to
reproduce this issue by the clone instance to fail on one node.

Might need some logs as well.

On Mon, Dec 7, 2020 at 10:40 PM Pavel Levshin <l...@581.spb.su> wrote:
Hello.


Despite many years of Pacemaker use, it never stops fooling me...


This time, I have faced a trivial problem. In my new setup, the cluster 
consists of several identical nodes. A clone resource (vg.sanlock) is started 
on every node, ensuring it has access to SAN storage. Almost all other 
resources are colocated and ordered after vg.sanlock.


This day, I've started a node, and vg.sanlock has failed to start. Then the cluster has 
desided to stop all the clone instances "due to node availability", taking down 
all other resources by dependencies. This seemes illogical to me. In the case of a 
failing clone, I would prefer to see it stopping on one node only. How do I do it 
properly?


I've tried this config with Pacemaker 2.0.3 and 1.1.16, the behaviour stays the 
same.


Reduced test config here:


pcs cluster auth test-pcmk0 test-pcmk1 <>/dev/tty

pcs cluster setup --name test-pcmk test-pcmk0 test-pcmk1 --transport udpu \

   --auto_tie_breaker 1

pcs cluster start --all --wait=60

pcs cluster cib tmp-cib.xml

cp tmp-cib.xml tmp-cib.xml.deltasrc

pcs -f tmp-cib.xml property set stonith-enabled=false

pcs -f tmp-cib.xml resource defaults resource-stickiness=100

pcs -f tmp-cib.xml resource create vg.sanlock ocf:pacemaker:Dummy \

   op monitor interval=10 timeout=20 start interval=0s stop interval=0s \

   timeout=20

pcs -f tmp-cib.xml resource clone vg.sanlock interleave=true

pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc



And here goes cluster reaction to the failure:


# crm_simulate -x state4.xml -S



Current cluster status:

Online: [ test-pcmk0 test-pcmk1 ]



Clone Set: vg.sanlock-clone [vg.sanlock]

      vg.sanlock      (ocf::pacemaker:Dummy): FAILED test-pcmk0

      Started: [ test-pcmk1 ]



Transition Summary:

* Stop       vg.sanlock:0     ( test-pcmk1 )   due to node availability

* Stop       vg.sanlock:1     ( test-pcmk0 )   due to node availability



Executing cluster transition:

* Pseudo action:   vg.sanlock-clone_stop_0

* Resource action: vg.sanlock   stop on test-pcmk1

* Resource action: vg.sanlock   stop on test-pcmk0

* Pseudo action:   vg.sanlock-clone_stopped_0

* Pseudo action:   all_stopped



Revised cluster status:

Online: [ test-pcmk0 test-pcmk1 ]



Clone Set: vg.sanlock-clone [vg.sanlock]

      Stopped: [ test-pcmk0 test-pcmk1 ]


As a sidenote, if I make those clones globally-unique, they seem to behave 
properly. But nowhere I found a reference to this solution. In general, 
globally-unique clones are referred to only where resource agents make 
distinction between clone instances. This is not the case.


--

Thanks,

Pavel



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


<cib crm_feature_set="3.0.12" validate-with="pacemaker-2.8" epoch="129" num_updates="3" admin_epoch="0" cib-last-written="Mon Dec  7 22:19:24 2020" update-origin="test-pcmk0" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.16-12.el7_4.8-94ff4df"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="test-pcmk"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
        <nvpair id="cib-bootstrap-options-maintenance-mode" name="maintenance-mode" value="false"/>
        <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1580196538"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="test-pcmk0"/>
      <node id="2" uname="test-pcmk1"/>
    </nodes>
    <resources>
      <clone id="vg.bv_sanlock-clone">
        <primitive class="ocf" id="vg.bv_sanlock" provider="pacemaker" type="Dummy">
          <operations>
            <op id="vg.bv_sanlock-monitor-interval-10" interval="10" name="monitor" timeout="20"/>
            <op id="vg.bv_sanlock-start-interval-0s" interval="0s" name="start"/>
            <op id="vg.bv_sanlock-stop-interval-0s" interval="0s" name="stop" timeout="20"/>
          </operations>
        </primitive>
        <meta_attributes id="vg.bv_sanlock-clone-meta_attributes">
          <nvpair id="vg.bv_sanlock-clone-meta_attributes-interleave" name="interleave" value="true"/>
        </meta_attributes>
      </clone>
    </resources>
    <constraints/>
    <rsc_defaults>
      <meta_attributes id="rsc_defaults-options">
        <nvpair id="rsc_defaults-options-resource-stickiness" name="resource-stickiness" value="100"/>
      </meta_attributes>
    </rsc_defaults>
  </configuration>
  <status>
    <node_state id="2" uname="test-pcmk1" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
      <transient_attributes id="2">
        <instance_attributes id="status-2">
          <nvpair id="status-2-shutdown" name="shutdown" value="0"/>
        </instance_attributes>
      </transient_attributes>
      <lrm id="2">
        <lrm_resources>
          <lrm_resource id="vg.bv_sanlock" type="Dummy" class="ocf" provider="pacemaker">
            <lrm_rsc_op id="vg.bv_sanlock_last_0" operation_key="vg.bv_sanlock_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.12" transition-key="5:43472:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" transition-magic="0:0;5:43472:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" on_node="test-pcmk1" call-id="174" rc-code="0" op-status="0" interval="0" last-run="1607368574" last-rc-change="1607368574" exec-time="38" queue-time="1" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" op-force-restart=" state  passwd  op_sleep  envfile " op-restart-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" op-secure-params=" passwd " op-secure-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
            <lrm_rsc_op id="vg.bv_sanlock_monitor_10000" operation_key="vg.bv_sanlock_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.12" transition-key="6:43472:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" transition-magic="0:0;6:43472:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" on_node="test-pcmk1" call-id="175" rc-code="0" op-status="0" interval="10000" last-rc-change="1607368574" exec-time="27" queue-time="0" op-digest="4811cef7f7f94e3a35a70be7916cb2fd" op-secure-params=" passwd " op-secure-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
          </lrm_resource>
        </lrm_resources>
      </lrm>
    </node_state>
    <node_state id="1" uname="test-pcmk0" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
      <transient_attributes id="1">
        <instance_attributes id="status-1">
          <nvpair id="status-1-shutdown" name="shutdown" value="0"/>
        </instance_attributes>
      </transient_attributes>
      <lrm id="1">
        <lrm_resources>
          <lrm_resource id="vg.bv_sanlock" type="Dummy" class="ocf" provider="pacemaker">
            <lrm_rsc_op id="vg.bv_sanlock_last_0" operation_key="vg.bv_sanlock_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.12" transition-key="4:43471:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" transition-magic="0:0;4:43471:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" on_node="test-pcmk0" call-id="186" rc-code="6" op-status="0" interval="0" last-run="1607368764" last-rc-change="1607368764" exec-time="98" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" op-force-restart=" state  passwd  op_sleep  envfile " op-restart-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" op-secure-params=" passwd " op-secure-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
            <lrm_rsc_op id="vg.bv_sanlock_monitor_10000" operation_key="vg.bv_sanlock_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.12" transition-key="5:43471:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" transition-magic="0:0;5:43471:0:c5373a03-319c-45e8-9d01-cb0dbd0e7cc2" on_node="test-pcmk0" call-id="187" rc-code="0" op-status="0" interval="10000" last-rc-change="1607368764" exec-time="55" queue-time="0" op-digest="4811cef7f7f94e3a35a70be7916cb2fd" op-secure-params=" passwd " op-secure-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
          </lrm_resource>
        </lrm_resources>
      </lrm>
    </node_state>
  </status>
</cib>
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to