Hi! Trying to reproduce a problem that had occurred in the past after a "crm resource refresh" ("reprobe"), I noticed something on the DC that looks odd to me:
Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Forcing the status of all resources to be redetected Jan 08 11:13:21 h16 pacemaker-controld[4478]: warning: new_event_notification (4478-26817-13): Broken pipe (32) ### We had that before, already... Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: State transition S_IDLE -> S_POLICY_ENGINE Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: Watchdog will be used via SBD if fencing is required and stonith-watchdog-timeout is nonzero Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_stonith_sbd ( h16 ) Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_DLM:0 ( h18 ) Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_DLM:1 ( h19 ) Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]: notice: * Start prm_DLM:2 ( h16 ) ... ## So basically an announcemt to START everything that's running (everything is running); shouldn't that be "monitoring" (probe) instead? Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Initiating monitor operation prm_stonith_sbd_monitor_0 on h19 Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Initiating monitor operation prm_stonith_sbd_monitor_0 on h18 Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Initiating monitor operation prm_stonith_sbd_monitor_0 locally on h16 ... ### So _probes_ are started, Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 aborted by operation prm_testVG_testLV_activate_monitor_0 'modify' on h16: Event failed Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 action 7 (prm_testVG_testLV_activate_monitor_0 on h16): expected 'not running' but got 'ok' Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 action 19 (prm_testVG_testLV_activate_monitor_0 on h18): expected 'not running' but got 'ok' Jan 08 11:13:21 h16 pacemaker-controld[4478]: notice: Transition 139 action 31 (prm_testVG_testLV_activate_monitor_0 on h19): expected 'not running' but got 'ok' ... ### That's odd, because the clone WAS running on each node. (Similar results were reported for other clones) Jan 08 11:13:43 h16 pacemaker-controld[4478]: notice: Transition 140 (Complete=34, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-79.bz2): Complete Jan 08 11:13:43 h16 pacemaker-controld[4478]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE ### So in the end nothing was actually started, but those messages are quite confusing. Pacemaker version was "(version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a)" on all three nodes (latest for SLES). For reference here are the primitives that had odd result: primitive prm_testVG_testLV_activate LVM-activate \ params vgname=testVG lvname=testLV vg_access_mode=lvmlockd activation_mode=shared \ op start timeout=90s interval=0 \ op stop timeout=90s interval=0 \ op monitor interval=60s timeout=90s \ meta priority=9000 clone cln_testVG_activate prm_testVG_testLV_activate \ meta interleave=true priority=9800 target-role=Started primitive prm_lvmlockd lvmlockd \ op start timeout=90 interval=0 \ op stop timeout=100 interval=0 \ op monitor interval=60 timeout=90 \ meta priority=9800 clone cln_lvmlockd prm_lvmlockd \ meta interleave=true priority=9800 order ord_lvmlockd__lvm_activate Mandatory: cln_lvmlockd ( cln_testVG_activate ) colocation col_lvm_activate__lvmlockd inf: ( cln_testVG_activate ) cln_lvmlockd ### lvmlockd similarly depends on DLM (order, colocation), so I don't see a problem Finally: h16:~ # vgs VG #PV #LV #SN Attr VSize VFree sys 1 3 0 wz--n- 222.50g 0 testVG 1 1 0 wz--ns 299.81g 289.81g Regards, Ulrich _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/