On 11.04.2022 19:02, Salatiel Filho wrote: > Hi, I am deploying pacemaker + drbd to provide a high availability > storage and during the troubleshooting tests I got an strange > behaviour where the colocation constraint for the remaining resources > and the cloned group appear to be just ignored. > > These are the constraints I have: > Location Constraints: > Ordering Constraints: > start DRBDData-clone then start nfs (kind:Mandatory) > Colocation Constraints: > nfs with DRBDData-clone (score:INFINITY) > Ticket Constraints: > > > The environment: I have a two node cluster with a remote quorum > device. The test was to stop the quorum device and afterwards stop the > node currently running all the services ( node1 ). > The expected behaviour would be that the remaining node would not be > able to do anything ( partition without-quorum ) until it gets quorum. > This is the output of pcs status on node2 after power off the quorum > device and the node1. > > Some resources have been removed from the output to make this email cleaner. > > Cluster name: storage-drbd > Cluster Summary: > * Stack: corosync > * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition > WITHOUT quorum > * Last updated: Mon Apr 11 12:28:06 2022 > * Last change: Mon Apr 11 12:26:10 2022 by root via cibadmin on node2 > * 2 nodes configured > * 11 resource instances configured > > Node List: > * Node node1: UNCLEAN (offline) > * Online: [ node2 ] > > Full List of Resources: > * fence-node1 (stonith:fence_vmware_rest): Started node2 > * fence-node2 (stonith:fence_vmware_rest): Started node1 (UNCLEAN) > * Clone Set: DRBDData-clone [DRBDData] (promotable): > * DRBDData (ocf::linbit:drbd): Master node1 (UNCLEAN) > * Slaves: [ node2 ] > * Resource Group: nfs: > * vip_nfs (ocf::heartbeat:IPaddr2): Started node1 (UNCLEAN) > * drbd_fs (ocf::heartbeat:Filesystem): Started node1 (UNCLEAN) > * nfsd (ocf::heartbeat:nfsserver): Started node1 (UNCLEAN) > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > > > > As expected, the node 2 is without quorum and waiting. The problem > hapenned when I turn the node1 back. The quorum was restablished, but > the drbd master started on node1, but the nfs group started on node2, > even though I have both start order and colocation to make both the > Cloned Resource and the NFS group to run on the same node. >
No. you do not. > > > Cluster name: storage-drbd > Cluster Summary: > * Stack: corosync > * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum > * Last updated: Mon Apr 11 12:29:08 2022 > * Last change: Mon Apr 11 12:26:10 2022 by root via cibadmin on node2 > * 2 nodes configured > * 11 resource instances configured > > Node List: > * Online: [ node1 node2 ] > > Full List of Resources: > * fence-node1 (stonith:fence_vmware_rest): Started node2 > * fence-node2 (stonith:fence_vmware_rest): Started node1 > * Clone Set: DRBDData-clone [DRBDData] (promotable): > * Masters: [ node2 ] > * Slaves: [ node1 ] > * Resource Group: nfs: > * vip_nfs (ocf::heartbeat:IPaddr2): Started node1 > * drbd_fs (ocf::heartbeat:Filesystem): FAILED node1 > * nfsd (ocf::heartbeat:nfsserver): Stopped > > Failed Resource Actions: > * drbd_fs_start_0 on node1 'error' (1): call=90, status='complete', > exitreason='Couldn't mount device [/dev/drbd0] as /exports/drbd0', > last-rc-change='2022-04-11 12:29:05 -03:00', queued=0ms, exec=2567ms > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > > > Can anyone explain to me why are the constraints being ignored? > You order/colocation is against starting of clone resource, not against master. If you need to order/colocate resource against master, you need to say this explicitly. Colocating/ordering against "start" is satisfied as soon as cloned resource is started as slave, before it gets promoted. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/