On Wed, 9 Oct 2019, Ken Gaillot wrote: > > One of the nodes has got a failure ("watchdog: BUG: soft lockup - > > CPU#7 stuck for 23s"), which resulted that the node could process > > traffic on the backend interface but not on the fronted one. Thus the > > services became unavailable but the cluster thought the node is all > > right and did not stonith it. > > > > How could we protect the cluster against such failures? > > See the ocf:heartbeat:ethmonitor agent (to monitor the interface itself) > and/or the ocf:pacemaker:ping agent (to monitor reachability of some IP > such as a gateway)
This looks really promising, thank you! Does the cluster regard it as a failure when a ocf:heartbeat:ethmonitor agent clone on a node does not run? :-) Best regards, Jozsef -- E-mail : kadlecsik.joz...@wigner.mta.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/