[ClusterLabs] A bug? (SLES15 SP2 with "crm resource refresh")

Ulrich Windl Fri, 08 Jan 2021 02:46:29 -0800

Hi!

Trying to reproduce a problem that had occurred in the past after a "crm 
resource refresh" ("reprobe"), I noticed something on the DC  that looks odd to 
me:


Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Forcing the status of 
all resources to be redetected
Jan 08 11:13:21 h16 pacemaker-controld[4478]:  warning: new_event_notification 
(4478-26817-13): Broken pipe (32)

### We had that before, already...

Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: State transition S_IDLE 
-> S_POLICY_ENGINE
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]:  notice: Watchdog will be used 
via SBD if fencing is required and stonith-watchdog-timeout is nonzero
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]:  notice:  * Start      
prm_stonith_sbd                      (             h16 )
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]:  notice:  * Start      
prm_DLM:0                            (             h18 )
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]:  notice:  * Start      
prm_DLM:1                            (             h19 )
Jan 08 11:13:21 h16 pacemaker-schedulerd[4477]:  notice:  * Start      
prm_DLM:2                            (             h16 )
...

## So basically an announcemt to START everything that's running (everything is 
running); shouldn't that be "monitoring" (probe) instead?

Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Initiating monitor 
operation prm_stonith_sbd_monitor_0 on h19
Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Initiating monitor 
operation prm_stonith_sbd_monitor_0 on h18
Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Initiating monitor 
operation prm_stonith_sbd_monitor_0 locally on h16
...
### So _probes_ are started,

Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Transition 139 aborted 
by operation prm_testVG_testLV_activate_monitor_0 'modify' on h16: Event failed
Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Transition 139 action 7 
(prm_testVG_testLV_activate_monitor_0 on h16): expected 'not running' but got 
'ok'
Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Transition 139 action 19 
(prm_testVG_testLV_activate_monitor_0 on h18): expected 'not running' but got 
'ok'
Jan 08 11:13:21 h16 pacemaker-controld[4478]:  notice: Transition 139 action 31 
(prm_testVG_testLV_activate_monitor_0 on h19): expected 'not running' but got 
'ok'
...

### That's odd, because the clone WAS running on each node. (Similar results 
were reported for other clones)
Jan 08 11:13:43 h16 pacemaker-controld[4478]:  notice: Transition 140 
(Complete=34, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-79.bz2): Complete
Jan 08 11:13:43 h16 pacemaker-controld[4478]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE
### So in the end nothing was actually started, but those messages are quite 
confusing.

Pacemaker version was "(version 
2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a)" on all three nodes 
(latest for SLES).


For reference here are the primitives that had odd result:
primitive prm_testVG_testLV_activate LVM-activate \
        params vgname=testVG lvname=testLV vg_access_mode=lvmlockd 
activation_mode=shared \
        op start timeout=90s interval=0 \
        op stop timeout=90s interval=0 \
        op monitor interval=60s timeout=90s \
        meta priority=9000
clone cln_testVG_activate prm_testVG_testLV_activate \
        meta interleave=true priority=9800 target-role=Started
primitive prm_lvmlockd lvmlockd \
        op start timeout=90 interval=0 \
        op stop timeout=100 interval=0 \
        op monitor interval=60 timeout=90 \
        meta priority=9800
clone cln_lvmlockd prm_lvmlockd \
        meta interleave=true priority=9800
order ord_lvmlockd__lvm_activate Mandatory: cln_lvmlockd ( cln_testVG_activate )
colocation col_lvm_activate__lvmlockd inf: ( cln_testVG_activate ) cln_lvmlockd
### lvmlockd similarly depends on DLM (order, colocation), so I don't see a 
problem

Finally:
h16:~ # vgs
  VG      #PV #LV #SN Attr   VSize   VFree
  sys       1   3   0 wz--n- 222.50g      0
  testVG    1   1   0 wz--ns 299.81g 289.81g


Regards,
Ulrich


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] A bug? (SLES15 SP2 with "crm resource refresh")

Reply via email to