Re: [ClusterLabs] [Pacemaker] large cluster - failure recovery

2015-11-19 Thread Cédric Dufour - Idiap Research Institute
[coming over from the old mailing list pacema...@oss.clusterlabs.org; sorry for any thread discrepancy] Hello, We've also setup a fairly large cluster - 24 nodes / 348 resources (pacemaker 1.1.12, corosync 1.4.7) - and pacemaker 1.1.12 is definitely the minimum version you'll want, thanks to

Re: [ClusterLabs] required nodes for quorum policy

2015-11-19 Thread Radoslaw Garbacz
Thank you Christine and Andrei, I took a look at the corosync quorum policy configuration options, and actually I would need a more conservative approach, i.e. to consider quorum only if all the nodes are present - any node loss is a quorum loss event for me. At present I check it in an agent,

[ClusterLabs] Failcount not resetting to zero after failure-timeout

2015-11-19 Thread Pritam Kharat
Hi All, I have 2 node HA setup. I have added migration_threshold=5 and failure-timeout=120s for my resources. When migration threshold is reached to 5 resources are migrated to other node. But once observed fail-count is not reset back to zero after 2 mins. The setup was in the same state almost