Hi, I have a four node cluster that uses iLo as fencing agent. When i simulate a node crash (either killing corosync or echo c > /proc/sysrq-trigger) the node is marked as UNCLEAN and requested to be restarted by the stonith agent, but everytime that happens another node in the cluster is also marked as UNCLEAN and rebooted as well. After the nodes are rebooted they are marked as online again and cluster resume operation without problem.
I have reviewed corosync and pacemaker logs but found nothing that explain why the other node is also rebooted. Any hint of what to check or what to look for would be appreciated. -----------------Cluster conf---------------------------------- node 1239211542: e1b12 \ attributes standby=off node 1239211543: e1b13 node 1239211581: e1b03 \ attributes standby=off node 1239211582: e1b07 \ attributes standby=off primitive fence-e1b03 stonith:fence_ilo \ params ipaddr=e1b03-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ op monitor interval=300 timeout=120 \ meta migration-threshold=2 target-role=Started primitive fence-e1b07 stonith:fence_ilo \ params ipaddr=e1b07-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ op monitor interval=300 timeout=120 \ meta migration-threshold=2 target-role=Started primitive fence-e1b12 stonith:fence_ilo \ params ipaddr=e1b12-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ op monitor interval=300 timeout=120 \ meta migration-threshold=2 target-role=Started primitive fence-e1b13 stonith:fence_ilo \ params ipaddr=e1b13-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ op monitor interval=300 timeout=120 \ meta migration-threshold=2 target-role=Started ..... extra resources ...... location l-f-e1b03 fence-e1b03 \ rule -inf: #uname eq e1b03 \ rule 10000: #uname eq e1b07 location l-f-e1b07 fence-e1b07 \ rule -inf: #uname eq e1b07 \ rule 10000: #uname eq e1b03 location l-f-e1b12 fence-e1b12 \ rule -inf: #uname eq e1b12 \ rule 10000: #uname eq e1b13 location l-f-e1b13 fence-e1b13 \ rule -inf: #uname eq e1b13 \ rule 10000: #uname eq e1b12 property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-e174ec8 \ cluster-infrastructure=corosync \ stonith-enabled=true \ cluster-name=test-cluster \ no-quorum-policy=freeze \ last-lrm-refresh=1483125286 ---------------------------------------------------------------------------------------- Regards, Ali
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org