Re: [Linux-HA] Antw: Re: Max number of resources under Pacemaker ?

2013-09-03 Thread Andrew Beekhof
On 04/09/2013, at 4:26 PM, "Ulrich Windl" wrote: > Hi! > > In my experience network traffic grows somewhat linear with the size of the > CIB. At some point you probably have to change communication parameters to > keep > the cluster in a happy comminication state. Yes. Tuning corosync.conf f

Re: [Linux-HA] Max number of resources under Pacemaker ?

2013-09-03 Thread Andrew Beekhof
On 04/09/2013, at 4:09 PM, Vladislav Bogdanov wrote: > 04.09.2013 07:16, Andrew Beekhof wrote: >> >> On 03/09/2013, at 9:20 PM, Moullé Alain >> wrote: >> >>> Hello, >>> >>> A simple question : is there a maximum number of resources (let's >>> say simple primitives) that Pacemaker can support

[Linux-HA] Antw: Re: Max number of resources under Pacemaker ?

2013-09-03 Thread Ulrich Windl
Hi! In my experience network traffic grows somewhat linear with the size of the CIB. At some point you probably have to change communication parameters to keep the cluster in a happy comminication state. Despite of the cluster internals, there may be problems if a node goes online and hundreds of

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Ulrich Windl
>>> Andrew Beekhof schrieb am 04.09.2013 um 05:44 in Nachricht : > On 04/09/2013, at 3:02 AM, Lars Marowsky-Bree wrote: > >> On 2013-09-03T10:25:58, Digimer wrote: >> >>> I've run only 2-node clusters and I've not seen this problem. That said, >>> I've long-ago moved off of openais in favour

Re: [Linux-HA] Max number of resources under Pacemaker ?

2013-09-03 Thread Vladislav Bogdanov
04.09.2013 07:16, Andrew Beekhof wrote: > > On 03/09/2013, at 9:20 PM, Moullé Alain > wrote: > >> Hello, >> >> A simple question : is there a maximum number of resources (let's >> say simple primitives) that Pacemaker can support at first at >> configuration of ressources via crm, and of course

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Ulrich Windl
>>> Lars Marowsky-Bree schrieb am 03.09.2013 um 19:02 in >>> Nachricht <20130903170257.go7...@suse.de>: [...] > This sounds more like a DC election race window or some such in > pacemaker. If Ulrich files a bug report with proper logs, I'm sure we > can resolve this (perhaps with an update to SP3

Re: [Linux-HA] error: te_connect_stonith: Sign-in failed: triggered a retry

2013-09-03 Thread Andrew Beekhof
On 30/08/2013, at 1:37 PM, Tom Parker wrote: > This is happening when I am using the really large CIB Its really hard to imagine one causing the other. Next time, can you set PCMK_blackbox=yes in your environment and retest? The log file will indicate a file with more information. http://blo

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Andrew Beekhof
On 04/09/2013, at 3:10 AM, Digimer wrote: > On 03/09/13 13:08, Lars Marowsky-Bree wrote: >> On 2013-09-03T13:04:52, Digimer wrote: >> >>> My mistake then. I had assumed that corosync was just a stripped down >>> openais, so I figured openais provided the same functions. My personal >>> experie

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Andrew Beekhof
On 04/09/2013, at 3:02 AM, Lars Marowsky-Bree wrote: > On 2013-09-03T10:25:58, Digimer wrote: > >> I've run only 2-node clusters and I've not seen this problem. That said, >> I've long-ago moved off of openais in favour of corosync. Given that >> membership is handled there, I would look at op

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Andrew Beekhof
On 03/09/2013, at 4:32 PM, Ulrich Windl wrote: > Hi! > > I don't have a real answer for this, but I can report other bad experience > with 2-node cluster like yours: > > If the DC is fenced, the other node tries to become DC, but if the other node > (who still thinks he's DC) reboots just b

Re: [Linux-HA] Max number of resources under Pacemaker ?

2013-09-03 Thread Andrew Beekhof
On 03/09/2013, at 9:20 PM, Moullé Alain wrote: > Hello, > > A simple question : is there a maximum number of resources (let's say simple > primitives) that Pacemaker can support at first at configuration of > ressources via crm, and of course after configuration when Pacemaker has to > monit

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov
03.09.2013 21:36, Lars Marowsky-Bree wrote: > On 2013-09-03T21:14:02, Vladislav Bogdanov wrote: > >>> To solve problem 2, simply disable corosync/pacemaker from starting on >>> boot. This way, the fenced node will be (hopefully) back up and running, >>> so you can ssh into it and look at what hap

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov
03.09.2013 21:45, Digimer wrote: > On 03/09/13 14:14, Vladislav Bogdanov wrote: >> 03.09.2013 07:04, Digimer wrote: >> ... >>> To solve problem 1, you can set a delay against one of the nodes. Say >>> you set the fence primitive for node 01 to have 'delay="15"'. When node >>> 1 goes to fence node 2

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Lars Marowsky-Bree
On 2013-09-03T21:14:02, Vladislav Bogdanov wrote: > > To solve problem 2, simply disable corosync/pacemaker from starting on > > boot. This way, the fenced node will be (hopefully) back up and running, > > so you can ssh into it and look at what happened. It won't try to rejoin > > the cluster th

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Digimer
On 03/09/13 14:14, Vladislav Bogdanov wrote: 03.09.2013 07:04, Digimer wrote: ... To solve problem 1, you can set a delay against one of the nodes. Say you set the fence primitive for node 01 to have 'delay="15"'. When node 1 goes to fence node 2, it starts immediately. When node 2 starts to fen

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov
03.09.2013 07:04, Digimer wrote: ... > To solve problem 1, you can set a delay against one of the nodes. Say > you set the fence primitive for node 01 to have 'delay="15"'. When node > 1 goes to fence node 2, it starts immediately. When node 2 starts to > fence node 1, it sees the 15 second delay a

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Digimer
On 03/09/13 13:08, Lars Marowsky-Bree wrote: On 2013-09-03T13:04:52, Digimer wrote: My mistake then. I had assumed that corosync was just a stripped down openais, so I figured openais provided the same functions. My personal experience with openais is limited to my early days of learning HA cl

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Lars Marowsky-Bree
On 2013-09-03T13:04:52, Digimer wrote: > My mistake then. I had assumed that corosync was just a stripped down > openais, so I figured openais provided the same functions. My personal > experience with openais is limited to my early days of learning HA > clustering on EL5. Yes and no. SLE HA 11

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Digimer
On 03/09/13 13:02, Lars Marowsky-Bree wrote: On 2013-09-03T10:25:58, Digimer wrote: I've run only 2-node clusters and I've not seen this problem. That said, I've long-ago moved off of openais in favour of corosync. Given that membership is handled there, I would look at openais as the source o

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Lars Marowsky-Bree
On 2013-09-03T10:25:58, Digimer wrote: > I've run only 2-node clusters and I've not seen this problem. That said, > I've long-ago moved off of openais in favour of corosync. Given that > membership is handled there, I would look at openais as the source of your > trouble. This is, sorry, entirel

Re: [Linux-HA] ERROR: Client child command [/usr/lib64/heartbeat/cib] is not executable

2013-09-03 Thread tanghuifeng
ln -s /usr/libexec/pacemaker/cib /usr/lib64/heartbeat/cib ln -s /usr/libexec/pacemaker/stonith /usr/lib64/heartbeat/stonithd ln -s /usr/libexec/pacemaker/attrd /usr/lib64/heartbeat/attrd ln -s /usr/libexec/pacemaker/crmd /usr/lib64/heartbeat/crmd -- View this message in context: http://linux-h

Re: [Linux-HA] ERROR: Client child command [/usr/lib64/heartbeat/cib] is not executable

2013-09-03 Thread supertren
Thank you a lot! -- View this message in context: http://linux-ha.996297.n3.nabble.com/ERROR-Client-child-command-usr-lib64-heartbeat-cib-is-not-executable-tp14984p15073.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mail

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Digimer
On 03/09/13 02:32, Ulrich Windl wrote: Hi! I don't have a real answer for this, but I can report other bad experience with 2-node cluster like yours: If the DC is fenced, the other node tries to become DC, but if the other node (who still thinks he's DC) reboots just before the other node has

[Linux-HA] Max number of resources under Pacemaker ?

2013-09-03 Thread Moullé Alain
Hello, A simple question : is there a maximum number of resources (let's say simple primitives) that Pacemaker can support at first at configuration of ressources via crm, and of course after configuration when Pacemaker has to monitor all the primitives ? (more precisely, could we envisage