Re: [ClusterLabs] Strange monitor return code log for LSB resource

Andrei Borzenkov Tue, 25 Jun 2019 09:47:32 -0700

25.06.2019 16:53, Harvey Shepherd пишет:
> Hi All,
> 
> 
> I have a 2 node cluster running under Pacemaker 2.0.2, with around 20 
> resources configured, the majority of which are LSB resources, but there are 
> also a few OCF ones. One of the LSB resources is controlled via an init 
> script called "logging" and runs only on the master node. The CIB 
> configuration for it is as follows:
> 
> 
>     <primitive id="logging" class="lsb" type="logging">
>       <operations>
>         <op name="monitor" interval="10s" id="logging-monitor-10s"/>
>         <op name="start" interval="0" id="logging-start-30s"/>
>         <op name="stop" interval="0" on-fail="restart" id="logging-stop-30s"/>
>       </operations>
>     </primitive>
> 
> 
> There is a global setting which sets the default timeout:
> 
> 
>   <op_defaults>
>     <meta_attributes id="op-options">
>       <nvpair name="timeout" value="30s" id="op-options-timeout"/>
>     </meta_attributes>
>   </op_defaults>
> 
> 
> All of the other LSB resources are configured in the same way, and seem to 
> work correctly, but for some reason I see the following recurring logs for 
> the logging resource:
> 
> 
> Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (recurring_action_timer)  
>   debug: Scheduling another invocation of logging_status_10000
> Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (operation_finished)      
>   debug: logging_status_10000:8423 - exited with rc=0
> Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (operation_finished)      
>   debug: logging_status_10000:8423:stdout [ Remote syslog service is running ]
> Jun 25 13:16:22 ctr_qemu pacemaker-execd     [1234] (log_finished)      
> debug: finished - rsc:logging action:monitor call_id:123 pid:8423 exit-code:0 
> exec-time:0ms queue-time:0ms
> Jun 25 13:16:23 ctr_qemu crm_resource        [8436] (determine_op_status)     
>   debug: logging_monitor_10000 on primary returned 'not running' (7) instead 
> of the expected value: 'ok' (0)


This is not from lrmd but from crm_resource command; also it says "on
primary" while log is on ctr_qemu. Do you have some monitoring tool that
calls crm_resource?

> Jun 25 13:16:23 ctr_qemu crm_resource        [8436] (unpack_rsc_op_failure)   
>   warning: Processing failed monitor of logging on primary: not running | rc=7
> Jun 25 13:16:30 ctr_qemu crm_resource        [8571] (determine_op_status)     
>   debug: logging_monitor_10000 on primary returned 'not running' (7) instead 
> of the expected value: 'ok' (0)
> Jun 25 13:16:30 ctr_qemu crm_resource        [8571] (unpack_rsc_op_failure)   
>   warning: Processing failed monitor of logging on primary: not running | rc=7
> Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (recurring_action_timer)  
>   debug: Scheduling another invocation of logging_status_10000
> Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (operation_finished)      
>   debug: logging_status_10000:8670 - exited with rc=0
> Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (operation_finished)      
>   debug: logging_status_10000:8670:stdout [ Remote syslog service is running ]
> Jun 25 13:16:32 ctr_qemu pacemaker-execd     [1234] (log_finished)      
> debug: finished - rsc:logging action:monitor call_id:123 pid:8670 exit-code:0 
> exec-time:0ms queue-time:0ms
> Jun 25 13:16:33 ctr_qemu crm_resource        [8683] (determine_op_status)     
>   debug: logging_monitor_10000 on primary returned 'not running' (7) instead 
> of the expected value: 'ok' (0)
> Jun 25 13:16:33 ctr_qemu crm_resource        [8683] (unpack_rsc_op_failure)   
>   warning: Processing failed monitor of logging on primary: not running | rc=7
> Jun 25 13:16:40 ctr_qemu crm_resource        [8818] (determine_op_status)     
>   debug: logging_monitor_10000 on primary returned 'not running' (7) instead 
> of the expected value: 'ok' (0)
> Jun 25 13:16:40 ctr_qemu crm_resource        [8818] (unpack_rsc_op_failure)   
>   warning: Processing failed monitor of logging on primary: not running | rc=7
> 
> Pacemaker is reporting failed resource actions, but fail-count is not 
> incremented for the resource:
> 
> 
> Migration Summary:
> * Node primary:
> * Node secondary:
> 
> Failed Resource Actions:
> * logging_monitor_10000 on primary 'not running' (7): call=119, 
> status=complete, exitreason='',
>     last-rc-change='Tue Jun 25 13:13:12 2019', queued=0ms, exec=0ms
> 
> 
> I have checked the operation of the LSB script manually and it always 
> correctly exits with a return code of 0 when I run it manually, indicating 
> that the resource is running. So my questions are:
> 
> 
> 1. Why does Pacemaker seem to be running a monitor operation in parallel with 
> a status operation, with conflicting results? A monitor operation returning 7 
> "not running" would only make sense for an OCF resource, but it is clearly 
> defined as LSB in the CIB.
> 
> 2. Why does the status operation always return 0 (running) and the monitor 
> operation always returns 7 (not running)?
> 
> 2. Why is fail-count not being incremented even though failures are being 
> logged?
> 
> 
> I would really appreciate any pointers that anyone could give me. Perhaps 
> I've made an error in the configuration.
> 
> 
> 
> Thanks,
> 
> Harvey Shepherd
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Strange monitor return code log for LSB resource

Reply via email to