[ClusterLabs] Why not retry a monitor (pacemaker-execd) that got a segmentation fault?

Ulrich Windl Tue, 14 Jun 2022 05:37:01 -0700

Hi!

I had a case where a VirtualDomain monitor operation ended in a core dump 
(actually it was pacemaker-execd, but it counted as "monitor" operation), and 
the cluster decided to restart the VM. Wouldn't it be worth to retry the 
monitor operation first?
Chances are that a re-tried monitor operation returns a better status than 
segmentation fault.
Or dies the logic just ignore processes dying on signals?


20201202.ba59be712-150300.4.21.1.x86_64 (SLES15 SP3)

Jun 14 14:09:16 h19 systemd-coredump[28788]: Process 28786 (pacemaker-execd) of 
user 0 dumped core.
Jun 14 14:09:16 h19 pacemaker-execd[7440]:  warning: 
prm_xen_v04_monitor_600000[28786] terminated with signal: Segmentation fault
Jun 14 14:09:16 h19 pacemaker-controld[7443]:  error: Result of monitor 
operation for prm_xen_v04 on h19: Error
Jun 14 14:09:16 h19 pacemaker-controld[7443]:  notice: Transition 9 action 107 
(prm_xen_v04_monitor_600000 on h19): expected 'ok' but got 'error'
...
Jun 14 14:09:16 h19 pacemaker-schedulerd[7442]:  notice:  * Recover    
prm_xen_v04              (             h19 )

Regards,
ulrich



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Why not retry a monitor (pacemaker-execd) that got a segmentation fault?

Reply via email to