Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Klaus Wenninger Thu, 08 Sep 2016 00:23:09 -0700

On 09/08/2016 08:55 AM, Digimer wrote:
> On 08/09/16 03:47 PM, Ulrich Windl wrote:
>>>>> Shermal Fernando <sherma...@millenniumit.com> schrieb am 08.09.2016 um 
>>>>> 06:41 in
>> Nachricht
>> <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>:
>>> The whole cluster will fail if the DC (crm daemon) is frozen due to CPU 
>>> starvation or hanging while trying to perform a IO operation.  
>>> Please share some thoughts on this issue.
>> What is "the whole cluster will fail"? If the DC times out, some recovery 
>> will take place.
> Yup. The starved node should be declared lost by corosync, the remaining
> nodes reform and if they're still quorate, the hung node should be
> fenced. Recovery occur and life goes on.
Didn't happen in my test (SIGSTOP to crmd).
Might be a configuration mistake though...
Even had sbd with a watchdog active (amongst
other - real - fencing devices).
Thinking if it might make sense so tickle the
crmd-API from sbd-pacemaker-watcher ...
>
> Unless you don't have fencing, then may $deity of mercy. ;)
>



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Reply via email to