25.06.2019 16:53, Harvey Shepherd пишет: > Hi All, > > > I have a 2 node cluster running under Pacemaker 2.0.2, with around 20 > resources configured, the majority of which are LSB resources, but there are > also a few OCF ones. One of the LSB resources is controlled via an init > script called "logging" and runs only on the master node. The CIB > configuration for it is as follows: > > > <primitive id="logging" class="lsb" type="logging"> > <operations> > <op name="monitor" interval="10s" id="logging-monitor-10s"/> > <op name="start" interval="0" id="logging-start-30s"/> > <op name="stop" interval="0" on-fail="restart" id="logging-stop-30s"/> > </operations> > </primitive> > > > There is a global setting which sets the default timeout: > > > <op_defaults> > <meta_attributes id="op-options"> > <nvpair name="timeout" value="30s" id="op-options-timeout"/> > </meta_attributes> > </op_defaults> > > > All of the other LSB resources are configured in the same way, and seem to > work correctly, but for some reason I see the following recurring logs for > the logging resource: > > > Jun 25 13:16:22 ctr_qemu pacemaker-execd [1234] (recurring_action_timer) > debug: Scheduling another invocation of logging_status_10000 > Jun 25 13:16:22 ctr_qemu pacemaker-execd [1234] (operation_finished) > debug: logging_status_10000:8423 - exited with rc=0 > Jun 25 13:16:22 ctr_qemu pacemaker-execd [1234] (operation_finished) > debug: logging_status_10000:8423:stdout [ Remote syslog service is running ] > Jun 25 13:16:22 ctr_qemu pacemaker-execd [1234] (log_finished) > debug: finished - rsc:logging action:monitor call_id:123 pid:8423 exit-code:0 > exec-time:0ms queue-time:0ms > Jun 25 13:16:23 ctr_qemu crm_resource [8436] (determine_op_status) > debug: logging_monitor_10000 on primary returned 'not running' (7) instead > of the expected value: 'ok' (0)
This is not from lrmd but from crm_resource command; also it says "on primary" while log is on ctr_qemu. Do you have some monitoring tool that calls crm_resource? > Jun 25 13:16:23 ctr_qemu crm_resource [8436] (unpack_rsc_op_failure) > warning: Processing failed monitor of logging on primary: not running | rc=7 > Jun 25 13:16:30 ctr_qemu crm_resource [8571] (determine_op_status) > debug: logging_monitor_10000 on primary returned 'not running' (7) instead > of the expected value: 'ok' (0) > Jun 25 13:16:30 ctr_qemu crm_resource [8571] (unpack_rsc_op_failure) > warning: Processing failed monitor of logging on primary: not running | rc=7 > Jun 25 13:16:32 ctr_qemu pacemaker-execd [1234] (recurring_action_timer) > debug: Scheduling another invocation of logging_status_10000 > Jun 25 13:16:32 ctr_qemu pacemaker-execd [1234] (operation_finished) > debug: logging_status_10000:8670 - exited with rc=0 > Jun 25 13:16:32 ctr_qemu pacemaker-execd [1234] (operation_finished) > debug: logging_status_10000:8670:stdout [ Remote syslog service is running ] > Jun 25 13:16:32 ctr_qemu pacemaker-execd [1234] (log_finished) > debug: finished - rsc:logging action:monitor call_id:123 pid:8670 exit-code:0 > exec-time:0ms queue-time:0ms > Jun 25 13:16:33 ctr_qemu crm_resource [8683] (determine_op_status) > debug: logging_monitor_10000 on primary returned 'not running' (7) instead > of the expected value: 'ok' (0) > Jun 25 13:16:33 ctr_qemu crm_resource [8683] (unpack_rsc_op_failure) > warning: Processing failed monitor of logging on primary: not running | rc=7 > Jun 25 13:16:40 ctr_qemu crm_resource [8818] (determine_op_status) > debug: logging_monitor_10000 on primary returned 'not running' (7) instead > of the expected value: 'ok' (0) > Jun 25 13:16:40 ctr_qemu crm_resource [8818] (unpack_rsc_op_failure) > warning: Processing failed monitor of logging on primary: not running | rc=7 > > Pacemaker is reporting failed resource actions, but fail-count is not > incremented for the resource: > > > Migration Summary: > * Node primary: > * Node secondary: > > Failed Resource Actions: > * logging_monitor_10000 on primary 'not running' (7): call=119, > status=complete, exitreason='', > last-rc-change='Tue Jun 25 13:13:12 2019', queued=0ms, exec=0ms > > > I have checked the operation of the LSB script manually and it always > correctly exits with a return code of 0 when I run it manually, indicating > that the resource is running. So my questions are: > > > 1. Why does Pacemaker seem to be running a monitor operation in parallel with > a status operation, with conflicting results? A monitor operation returning 7 > "not running" would only make sense for an OCF resource, but it is clearly > defined as LSB in the CIB. > > 2. Why does the status operation always return 0 (running) and the monitor > operation always returns 7 (not running)? > > 2. Why is fail-count not being incremented even though failures are being > logged? > > > I would really appreciate any pointers that anyone could give me. Perhaps > I've made an error in the configuration. > > > > Thanks, > > Harvey Shepherd > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/