Re: [Pacemaker] large cluster - failure recovery

2015-11-19 Thread Cédric Dufour - Idiap Research Institute
Hello, We've also setup a fairly large cluster - 24 nodes / 348 resources (pacemaker 1.1.12, corosync 1.4.7) - and pacemaker 1.1.12 is definitely the minimum version you'll want, thanks to changes on how the CIB is handled. If you're going to handle a large number (~several hundreds) of

Re: [Pacemaker] large cluster - failure recovery

2015-11-04 Thread Radoslaw Garbacz
Thank you, will give it a try. On Wed, Nov 4, 2015 at 12:50 PM, Trevor Hemsley wrote: > On 04/11/15 18:41, Radoslaw Garbacz wrote: > > Details: > > OS: CentOS 6 > > Pacemaker: Pacemaker 1.1.9-1512.el6 > > Corosync: Corosync Cluster Engine, version '2.3.2' > > yum update

Re: [Pacemaker] large cluster - failure recovery

2015-11-04 Thread Trevor Hemsley
On 04/11/15 18:41, Radoslaw Garbacz wrote: > Details: > OS: CentOS 6 > Pacemaker: Pacemaker 1.1.9-1512.el6 > Corosync: Corosync Cluster Engine, version '2.3.2' yum update Pacemaker is currently 1.1.12 and corosync 1.4.7 on CentOS 6. There were major improvements in speed with later versions of

[Pacemaker] large cluster - failure recovery

2015-11-04 Thread Radoslaw Garbacz
Hi, I have a cluster of 32 nodes, and after some tuning was able to have it started and running, but it does not recover from a node disconnect-connect failure. It regains quorum, but CIB does not recover to a synchronized state and "cibadmin -Q" times out. Is there anything with corosync or