On 09/08/2016 08:55 AM, Digimer wrote: > On 08/09/16 03:47 PM, Ulrich Windl wrote: >>>>> Shermal Fernando <sherma...@millenniumit.com> schrieb am 08.09.2016 um >>>>> 06:41 in >> Nachricht >> <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: >>> The whole cluster will fail if the DC (crm daemon) is frozen due to CPU >>> starvation or hanging while trying to perform a IO operation. >>> Please share some thoughts on this issue. >> What is "the whole cluster will fail"? If the DC times out, some recovery >> will take place. > Yup. The starved node should be declared lost by corosync, the remaining > nodes reform and if they're still quorate, the hung node should be > fenced. Recovery occur and life goes on. Didn't happen in my test (SIGSTOP to crmd). Might be a configuration mistake though... Even had sbd with a watchdog active (amongst other - real - fencing devices). Thinking if it might make sense so tickle the crmd-API from sbd-pacemaker-watcher ... > > Unless you don't have fencing, then may $deity of mercy. ;) >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org