We have a two node cluster managing a two iSCSI targets on top of DRBD. Except for the fence agents, the resources are configured to reside on only one node at a time. These are CentOS 7 servers running Pacemaker version 1.1.19-8.el7_6.4. DRBD is version: 9.0.16-1.
We have two iSCSI targets that are served up by Pacemaker. I configured these servers so that these targets are 100% controlled by Pacemaker -- or at least that was my intention. My iSCSI targets part of a resource group, called ISCSIGroup. I have a colocation constraint that I believe restricts this group so that it can only run on the same node that hosts my DRBD resource. DRBD is configured as primary/secondary. ----- ISCSIGroup: storageLVM ISCSICentipedeIP ISCSIMillipedeIP targetRHEVM lunRHEVM targetVMStorage lunVMStorage Colocation Constraints: ISCSIGroup with storageClusterClone (score:INFINITY) (with-rsc-role:Master) ----- I was doing OS maintenance on the node that was NOT hosting the iSCSI targets (storage2). I took the following steps to do the updates: I put the cluster into maintenance mode and did a yum update on storage2. Then, I took the cluster out of maintenance mode (which was successfully done.). Then, I put the cluster, on storage2, into standby mode and then stopped the cluster on storage2. I then rebooted storage2. After storage2 came back online, I started the cluster on storage2. All of these steps were successful. Then, as a final step, I instructed pacemaker to unstandby storage2. That is when things went horribly wrong. What happened is that storage1 (primary node) complained that the monitor action for the iscsi targets failed on storage2. The error is suggesting that they were already running on storage2. Are my constraints, not sufficient? I would really appreciate some input on what I've got wrong, or what is missing that would cause the target resources to start on both nodes. Thanks, Bryan Here are logs from storage1, which was the node hosting the resource: Jun 20 11:48:35 storage1 crmd[240695]: notice: Initiating monitor operation targetRHEVM_monitor_0 on storage2 Jun 20 11:48:35 storage1 crmd[240695]: notice: Initiating monitor operation lunRHEVM_monitor_0 on storage2 Jun 20 11:48:35 storage1 crmd[240695]: notice: Initiating monitor operation targetVMStorage_monitor_0 on storage2 Jun 20 11:48:35 storage1 crmd[240695]: notice: Initiating monitor operation lunVMStorage_monitor_0 on storage2 Jun 20 11:48:36 storage1 crmd[240695]: warning: Action 18 (targetRHEVM_monitor_0) on storage2 failed (target: 7 vs. rc: 0): Error Jun 20 11:48:36 storage1 crmd[240695]: notice: Transition aborted by operation targetRHEVM_monitor_0 'modify' on storage2: Event failed Jun 20 11:48:36 storage1 crmd[240695]: warning: Action 20 (targetVMStorage_monitor_0) on storage2 failed (target: 7 vs. rc: 0): Error Jun 20 11:48:36 storage1 crmd[240695]: notice: Transition 1 (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1054.bz2): Complete Jun 20 11:48:36 storage1 pengine[240694]: error: Resource targetRHEVM is active on 2 nodes (attempting recovery) Jun 20 11:48:36 storage1 pengine[240694]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information Jun 20 11:48:36 storage1 pengine[240694]: error: Resource targetVMStorage is active on 2 nodes (attempting recovery) Jun 20 11:48:36 storage1 pengine[240694]: notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information Jun 20 11:48:36 storage1 pengine[240694]: notice: * Move targetRHEVM ( storage2 -> storage1 ) Jun 20 11:48:36 storage1 pengine[240694]: notice: * Restart lunRHEVM ( storage1 ) due to required targetRHEVM start Jun 20 11:48:36 storage1 pengine[240694]: notice: * Move targetVMStorage ( storage2 -> storage1 ) Jun 20 11:48:36 storage1 pengine[240694]: notice: * Restart lunVMStorage ( storage1 ) due to required targetVMStorage start Jun 20 11:48:36 storage1 pengine[240694]: error: Calculated transition 2 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-7.bz2 Jun 20 11:48:36 storage1 crmd[240695]: notice: Initiating stop operation lunVMStorage_stop_0 locally on storage1 Jun 20 11:48:36 storage1 iSCSILogicalUnit(lunVMStorage)[41725]: INFO: Deleted LUN 1. Jun 20 11:48:37 storage1 iSCSILogicalUnit(lunVMStorage)[41725]: INFO: Deleted storage object lunVMStorage. Jun 20 11:48:37 storage1 crmd[240695]: notice: Result of stop operation for lunVMStorage on storage1: 0 (ok) Jun 20 11:48:37 storage1 crmd[240695]: notice: Initiating stop operation targetVMStorage_stop_0 locally on storage1 Jun 20 11:48:37 storage1 crmd[240695]: notice: Initiating stop operation targetVMStorage_stop_0 on storage2 Jun 20 11:48:37 storage1 iSCSITarget(targetVMStorage)[41806]: INFO: Deleted Target iqn.2019-03.com.leepfrog:storage.vmstorage. Jun 20 11:48:37 storage1 crmd[240695]: notice: Result of stop operation for targetVMStorage on storage1: 0 (ok) Jun 20 11:48:37 storage1 crmd[240695]: notice: Initiating stop operation lunRHEVM_stop_0 locally on storage1 Jun 20 11:48:37 storage1 iSCSILogicalUnit(lunRHEVM)[41863]: INFO: Deleted LUN 0. Jun 20 11:48:38 storage1 iSCSILogicalUnit(lunRHEVM)[41863]: INFO: Deleted storage object lunRHEVM. Jun 20 11:48:38 storage1 crmd[240695]: notice: Result of stop operation for lunRHEVM on storage1: 0 (ok) Jun 20 11:48:38 storage1 crmd[240695]: notice: Initiating stop operation targetRHEVM_stop_0 locally on storage1 Jun 20 11:48:38 storage1 crmd[240695]: notice: Initiating stop operation targetRHEVM_stop_0 on storage2 Jun 20 11:48:38 storage1 iSCSITarget(targetRHEVM)[42032]: INFO: Deleted Target iqn.2019-02.com.leepfrog:storage.rhevm. Jun 20 11:48:38 storage1 crmd[240695]: notice: Result of stop operation for targetRHEVM on storage1: 0 (ok) Jun 20 11:48:38 storage1 crmd[240695]: notice: Initiating start operation targetRHEVM_start_0 locally on storage1 Jun 20 11:48:38 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Parameter auto_add_default_portal is now 'false'. Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Created target iqn.2019-02.com.leepfrog:storage.rhevm. Created TPG 1. Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Created Node ACL for iqn.1994-05.com.redhat:3d066d1f423e Jun 20 11:48:39 storage1 iSCSITarget(targetRHEVM)[42089]: INFO: Created Node ACL for iqn.1994-05.com.redhat:84f0f7458c58 Jun 20 11:48:39 storage1 crmd[240695]: notice: Result of start operation for targetRHEVM on storage1: 0 (ok) Jun 20 11:48:39 storage1 crmd[240695]: notice: Initiating monitor operation targetRHEVM_monitor_10000 locally on storage1 Jun 20 11:48:39 storage1 crmd[240695]: notice: Initiating start operation lunRHEVM_start_0 locally on storage1 Jun 20 11:48:39 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: INFO: Created block storage object lunRHEVM using /dev/storage/rhevm. Jun 20 11:48:40 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: INFO: Created LUN 0. Created LUN 0->0 mapping in node ACL iqn.1994-05.com.redhat:84f0f7458c58 Created LUN 0->0 mapping in node ACL iqn.1994-05.com.redhat:3d066d1f423e Jun 20 11:48:40 storage1 iSCSILogicalUnit(lunRHEVM)[42221]: ERROR: No such NetworkPortal in configfs: /sys/kernel/config/target/iscsi/iqn.2019-02.com.leepfrog:storage.rhevm/tpgt_1/np/0.0.0.0:3260 Jun 20 11:48:40 storage1 crmd[240695]: notice: Result of start operation for lunRHEVM on storage1: 0 (ok) Jun 20 11:48:40 storage1 crmd[240695]: notice: Initiating monitor operation lunRHEVM_monitor_10000 locally on storage1 Jun 20 11:48:40 storage1 crmd[240695]: notice: Initiating start operation targetVMStorage_start_0 locally on storage1 Jun 20 11:48:40 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Parameter auto_add_default_portal is now 'false'. Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Created target iqn.2019-03.com.leepfrog:storage.vmstorage. Created TPG 1. Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Created Node ACL for iqn.1994-05.com.redhat:3d066d1f423e Jun 20 11:48:41 storage1 iSCSITarget(targetVMStorage)[42383]: INFO: Created Node ACL for iqn.1994-05.com.redhat:84f0f7458c58 Jun 20 11:48:41 storage1 crmd[240695]: notice: Result of start operation for targetVMStorage on storage1: 0 (ok) Jun 20 11:48:41 storage1 crmd[240695]: notice: Initiating monitor operation targetVMStorage_monitor_10000 locally on storage1 Jun 20 11:48:41 storage1 crmd[240695]: notice: Initiating start operation lunVMStorage_start_0 locally on storage1 Jun 20 11:48:41 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: INFO: Created block storage object lunVMStorage using /dev/storage/vmstorage. Jun 20 11:48:42 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: INFO: Created LUN 1. Created LUN 1->1 mapping in node ACL iqn.1994-05.com.redhat:84f0f7458c58 Created LUN 1->1 mapping in node ACL iqn.1994-05.com.redhat:3d066d1f423e Jun 20 11:48:42 storage1 iSCSILogicalUnit(lunVMStorage)[42555]: ERROR: No such NetworkPortal in configfs: /sys/kernel/config/target/iscsi/iqn.2019-03.com.leepfrog:storage.vmstorage/tpgt_1/np/0.0.0.0:3260 Jun 20 11:48:42 storage1 lrmd[240692]: notice: lunVMStorage_start_0:42555:stderr [ /usr/lib/ocf/resource.d/heartbeat/iSCSILogicalUnit: line 410: /sys/kernel/config/target/core/iblock_0/lunVMStorage/wwn/vpd_unit_serial: No such file or directory ] Jun 20 11:48:42 storage1 crmd[240695]: notice: Result of start operation for lunVMStorage on storage1: 0 (ok) Jun 20 11:48:42 storage1 crmd[240695]: notice: Initiating monitor operation lunVMStorage_monitor_10000 locally on storage1 Jun 20 11:48:42 storage1 crmd[240695]: notice: Transition 2 (Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-error-7.bz2): Complete Jun 20 11:48:42 storage1 crmd[240695]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE Here are the logs from storage2, which was coming back online: Jun 20 11:48:36 storage2 crmd[22305]: notice: Result of probe operation for ISCSIMillipedeIP on storage2: 7 (not running) Jun 20 11:48:36 storage2 crmd[22305]: notice: Result of probe operation for ISCSICentipedeIP on storage2: 7 (not running) Jun 20 11:48:36 storage2 crmd[22305]: notice: Result of probe operation for targetRHEVM on storage2: 0 (ok) Jun 20 11:48:36 storage2 crmd[22305]: notice: Result of probe operation for targetVMStorage on storage2: 0 (ok) Jun 20 11:48:36 storage2 crmd[22305]: notice: Result of probe operation for lunRHEVM on storage2: 7 (not running) Jun 20 11:48:36 storage2 crmd[22305]: notice: Result of probe operation for lunVMStorage on storage2: 7 (not running) -- Bryan K. Walton 319-337-3877 Linux Systems Administrator Leepfrog Technologies, Inc _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/