Hi! Seeing the detailed log of events would be helpful. Despite of that we had a similar issue with using multicast (and after adding a new node to an existing cluster). Switching to UDPU helped in our case, but unless we see the details, it's all just guessing...
Ulrich P.S. A good new year to everyone! >>> Alfonso Ali <alfonso....@gmail.com> schrieb am 30.12.2016 um 21:40 in >>> Nachricht <CANeoTMcuNGw_T9e4WNEEK-nmHnV-NwiX2Ck0UBDnVeuoiC=r...@mail.gmail.com>: > Hi, > > I have a four node cluster that uses iLo as fencing agent. When i simulate > a node crash (either killing corosync or echo c > /proc/sysrq-trigger) the > node is marked as UNCLEAN and requested to be restarted by the stonith > agent, but everytime that happens another node in the cluster is also > marked as UNCLEAN and rebooted as well. After the nodes are rebooted they > are marked as online again and cluster resume operation without problem. > > I have reviewed corosync and pacemaker logs but found nothing that explain > why the other node is also rebooted. > > Any hint of what to check or what to look for would be appreciated. > > -----------------Cluster conf---------------------------------- > node 1239211542: e1b12 \ > attributes standby=off > node 1239211543: e1b13 > node 1239211581: e1b03 \ > attributes standby=off > node 1239211582: e1b07 \ > attributes standby=off > primitive fence-e1b03 stonith:fence_ilo \ > params ipaddr=e1b03-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ > op monitor interval=300 timeout=120 \ > meta migration-threshold=2 target-role=Started > primitive fence-e1b07 stonith:fence_ilo \ > params ipaddr=e1b07-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ > op monitor interval=300 timeout=120 \ > meta migration-threshold=2 target-role=Started > primitive fence-e1b12 stonith:fence_ilo \ > params ipaddr=e1b12-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ > op monitor interval=300 timeout=120 \ > meta migration-threshold=2 target-role=Started > primitive fence-e1b13 stonith:fence_ilo \ > params ipaddr=e1b13-ilo login=fence_agent passwd=XXX ssl_insecure=1 \ > op monitor interval=300 timeout=120 \ > meta migration-threshold=2 target-role=Started > ..... extra resources ...... > location l-f-e1b03 fence-e1b03 \ > rule -inf: #uname eq e1b03 \ > rule 10000: #uname eq e1b07 > location l-f-e1b07 fence-e1b07 \ > rule -inf: #uname eq e1b07 \ > rule 10000: #uname eq e1b03 > location l-f-e1b12 fence-e1b12 \ > rule -inf: #uname eq e1b12 \ > rule 10000: #uname eq e1b13 > location l-f-e1b13 fence-e1b13 \ > rule -inf: #uname eq e1b13 \ > rule 10000: #uname eq e1b12 > property cib-bootstrap-options: \ > have-watchdog=false \ > dc-version=1.1.15-e174ec8 \ > cluster-infrastructure=corosync \ > stonith-enabled=true \ > cluster-name=test-cluster \ > no-quorum-policy=freeze \ > last-lrm-refresh=1483125286 > ---------------------------------------------------------------------------- > ------------ > > Regards, > Ali _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org