On Thu, 2020-03-05 at 13:14 +0000, Jaap Winius wrote: > Hi folks, > > My test system, which includes support for a filesystem resource > called 'mount', works fine otherwise, but every day or so I see > monitor errors like the following when I run 'pcs status': > > Failed Resource Actions: > * mount_monitor_20000 on bd3c7 'unknown error' (1): call=23, > status=Error, exitreason='', > last-rc-change='Thu Mar 5 04:57:55 2020', queued=0ms, > exec=0ms > > The corosync.log shows some more information (see log fragments > below), but I'm unable to identify a cause. The resource monitor > bombs > out, produces a core dump and then starts up again about 2 seconds > later. I've also seen this happen with the monitor for my nfsserver > resource. Other than that it stops for a few seconds, the other > problem is that this will eventually cause the filesystem with the > ./pacemaker/cores/ directory to fill up with core files (so far, > each > is less than 1MB). > > Could this be a bug, or is my software not configured correctly > (see > cfg below)? > > Thanks, > > Jaap > > PS -- I'm using CentOS 7.7.1908, Corosync 2.4.3, Pacemaker 1.1.20, > PCS > 0.9.167 and DRBD 9.10.0. > > ################# corosync.log ######### > > Mar 05 04:57:55 [15652] bd3c7.umrk.nl lrmd: error: > child_waitpid: Managed process 22553 (mount_monitor_20000) > dumped > core
This would have to be a bug in the resource agent. I'd build it with debug symbols to get a backtrace from the core. > Mar 05 04:57:55 [15652] bd3c7.umrk.nl lrmd: warning: > operation_finished: mount_monitor_20000:22553 - terminated with > signal > 11 > Mar 05 04:57:55 [15655] bd3c7.umrk.nl crmd: error: > process_lrm_event: Result of monitor operation for mount on bd3c7: > Error | call=23 key=mount_monitor_20000 confirmed=false status=4 > cib-update=143 > ... > Mar 05 04:57:55 [15655] bd3c7.umrk.nl crmd: info: > abort_transition_graph: Transition aborted by operation > mount_monitor_20000 'create' on bd3c7: Old event | > magic=4:1;40:2:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953 cib=0.22.62 > source=process_graph_event:499 complete=true > ... > Mar 05 04:57:55 [15655] bd3c7.umrk.nl crmd: info: > process_graph_event: Detected action (2.40) > mount_monitor_20000.23=unknown error: failed > ... > Mar 05 04:57:56 [15652] bd3c7.umrk.nl lrmd: info: > cancel_recurring_action: Cancelling ocf operation > mount_monitor_20000 > ... > Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: notice: > te_rsc_command: Initiating monitor operation > mount_monitor_20000 > locally on bd3c7 | action 1 > Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: info: > do_lrm_rsc_op: Performing > key=1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953 > op=mount_monitor_20000 > ... > Mar 05 04:57:57 [15650] bd3c7.umrk.nl cib: info: > cib_perform_op: + > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resour > ce[@id='mount']/lrm_rsc_op[@id='mount_monitor_20000']: @transition- > key=1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953, @transition-magic=- > 1:193;1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953, @call-id=-1, @rc- > code=193, @op-status=-1, @last-rc-change=1583380677, > @exec-time=0 > ... > Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: info: > process_lrm_event: Result of monitor operation for mount on bd3c7: > 0 > (ok) | call=51 key=mount_monitor_20000 confirmed=false cib-update=159 > ... > Mar 05 04:57:57 [15650] bd3c7.umrk.nl cib: info: > cib_perform_op: + > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resour > ce[@id='mount']/lrm_rsc_op[@id='mount_monitor_20000']: @transition- > magic=0:0;1:71:0:37dad885-d4be-4dcd-8d5f-fd9663e9f953, @call-id=51, > @rc-code=0, @op-status=0, > @exec-time=70 > Mar 05 04:57:57 [15650] bd3c7.umrk.nl cib: info: > cib_process_request: Completed cib_modify operation for > section > status: OK (rc=0, origin=bd3c7/crmd/159, version=0.22.77) > Mar 05 04:57:57 [15655] bd3c7.umrk.nl crmd: info: > match_graph_event: Action mount_monitor_20000 (1) confirmed on > bd3c7 > (rc=0) > > ######################################## > > ################# Pacemaker cfg ######## > > ~# pcs resource defaults resource-stickiness=100 ; \ > pcs resource create drbd ocf:linbit:drbd drbd_resource=r0 op > monitor interval=60s ; \ > pcs resource master drbd master-max=1 master-node-max=1 > clone-max=2 clone-node-max=1 notify=true ; \ > pcs resource create mount Filesystem device="/dev/drbd0" > directory="/data" fstype="ext4" ; \ > pcs constraint colocation add mount with drbd-master > INFINITY > with-rsc-role=Master ; \ > pcs constraint order promote drbd-master then mount ; \ > pcs resource create vip ocf:heartbeat:IPaddr2 > ip=192.168.2.73 > cidr_netmask=24 op monitor interval=30s ; \ > pcs constraint colocation add vip with drbd-master INFINITY > with-rsc-role=Master ; \ > pcs constraint order mount then vip ; \ > pcs resource create nfsd nfsserver nfs_shared_infodir=/data ; > \ > pcs resource create nfscfg exportfs > clientspec="192.168.2.55" > options=rw,no_subtree_check,no_root_squash directory=/data fsid=0 ; \ > pcs constraint colocation add nfsd with vip ; \ > pcs constraint colocation add nfscfg with nfsd ; \ > pcs constraint order vip then nfsd ; \ > pcs constraint order nfsd then nfscfg > > ######################################## > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/