Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrey Groshev
13.01.2014, 02:51, "Andrew Beekhof" : > On 10 Jan 2014, at 9:55 pm, Andrey Groshev wrote: > >>  10.01.2014, 14:31, "Andrey Groshev" : >>>  10.01.2014, 14:01, "Andrew Beekhof" :   On 10 Jan 2014, at 5:03 pm, Andrey Groshev wrote: >    10.01.2014, 05:29, "Andrew Beekhof" : >> On

[Pacemaker] Better way to change master in 3 node pgsql cluster

2014-01-13 Thread Andrey Rogovsky
Hi I have 3 node postgresql cluster. It work well. But I have some trobule with change master. For now, if I need change master, I must: 1) Stop PGSQL on each node and cluster service 2) Start Setup new manual PGSQL replication 3) Change attributes on each node for point to new master 4) Stop PGS

Re: [Pacemaker] again "return code", now in crm_attribute

2014-01-13 Thread Andrey Groshev
13.01.2014, 02:51, "Andrew Beekhof" : > On 10 Jan 2014, at 6:18 pm, Andrey Groshev wrote: > >>  10.01.2014, 10:15, "Andrew Beekhof" : >>>  On 10 Jan 2014, at 4:38 pm, Andrey Groshev wrote:   10.01.2014, 09:06, "Andrew Beekhof" : >   On 10 Jan 2014, at 3:51 pm, Andrey Groshev wrote: >>

Re: [Pacemaker] How to configure hearbeat using private network?

2014-01-13 Thread Lars Marowsky-Bree
On 2014-01-12T18:53:50, John Wei wrote: > I believe corosync does support this. Can someone point me to the document > on how to do this. Just configure corosync to use the private network interface via the bindnetaddr. -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennife

Re: [Pacemaker] Location / Colocation constraints issue

2014-01-13 Thread Gaëtan Slongo
Hi ! Thanks for answer. I'm not trying to use shorewall as a ms resource. Let me explain : I have 2 nodes. All resources are always on the same node (using group and constraints) but what I want to do is to start a shorewall on the "passive" node.

[Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
Hi, I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output of "crm_resource -L" is not trust-able, shortly after a node is booted. Here is the output from crm_resource -L on one of the nodes in a two node cluster (the one that was not rebooted): st-fencing (stonith:fence_foo

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 13 Jan 2014, at 8:31 pm, Andrey Groshev wrote: > > > 13.01.2014, 02:51, "Andrew Beekhof" : >> On 10 Jan 2014, at 9:55 pm, Andrey Groshev wrote: >> >>> 10.01.2014, 14:31, "Andrey Groshev" : 10.01.2014, 14:01, "Andrew Beekhof" : > On 10 Jan 2014, at 5:03 pm, Andrey Groshev wro

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 5:13 am, Brian J. Murrell (brian) wrote: > Hi, > > I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output > of "crm_resource -L" is not trust-able, shortly after a node is booted. > > Here is the output from crm_resource -L on one of the nodes in a two > nod

Re: [Pacemaker] [Linux-HA] Better way to change master in 3 node pgsql cluster

2014-01-13 Thread Andrew Beekhof
On 13 Jan 2014, at 8:32 pm, Andrey Rogovsky wrote: > Hi > > I have 3 node postgresql cluster. > It work well. But I have some trobule with change master. > > For now, if I need change master, I must: > 1) Stop PGSQL on each node and cluster service > 2) Start Setup new manual PGSQL replication

Re: [Pacemaker] Location / Colocation constraints issue

2014-01-13 Thread Andrew Beekhof
On 19 Dec 2013, at 1:08 am, Gaëtan Slongo wrote: > Hi ! > > I'm currently building a 2 node cluster for firewalling. > I would like to run a shorewall on both on the master and the "Slave" > node. I tried many things but nothing works as expected. Shorewall > configurations are good. > What I w

Re: [Pacemaker] pgsql RA - slave is in HS:ASYNC status and won; t promote

2014-01-13 Thread 東一彦
Hi, > but after some tests something went wrong and i don't know what and why and how to get it back working ... now when i start crm, master is PRI, but slave gets into HS:ASYNC state .. and when master fails, and slave gets into HS:alone state It is PostgreSQL to select the node whether "sync

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
Apart from anything else, your timeout needs to be bigger: Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( commands.c:1321 ) error: log_operation: Operation 'reboot' [11331] (call 2 from crmd.17227) for host 'dev-cluster2-node2.unix.tensor.ru' with device 'st1' r

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 1:19 pm, Andrew Beekhof wrote: > Apart from anything else, your timeout needs to be bigger: > > Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( > commands.c:1321 ) error: log_operation: Operation 'reboot' [11331] (call > 2 from crmd.17227) for

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
Ok, here's what happens: 1. node2 is lost 2. fencing of node2 starts 3. node2 reboots (and cluster starts) 4. node2 returns to the membership 5. node2 is marked as a cluster member 6. DC tries to bring it into the cluster, but needs to cancel the active transition first. Which is a problem sin

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrey Groshev
14.01.2014, 06:25, "Andrew Beekhof" : > Apart from anything else, your timeout needs to be bigger: > > Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: (   > commands.c:1321  )   error: log_operation: Operation 'reboot' [11331] (call 2 > from crmd.17227) for host 'dev-cluste

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrey Groshev
14.01.2014, 07:00, "Andrew Beekhof" : > On 14 Jan 2014, at 1:19 pm, Andrew Beekhof wrote: > >>  Apart from anything else, your timeout needs to be bigger: >> >>  Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: (   >> commands.c:1321  )   error: log_operation: Operation 'reb

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:34 pm, Andrey Groshev wrote: > > > 14.01.2014, 06:25, "Andrew Beekhof" : >> Apart from anything else, your timeout needs to be bigger: >> >> Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( >> commands.c:1321 ) error: log_operation: Operation '

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: > > The local cib hasn't caught up yet by the looks of it. Should crm_resource actually be [mis-]reporting as if it were knowledgeable when it's not though? IOW is this expected behaviour or should it be considered a bug? Should I open a

[Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread renayama19661014
Hi All, I contributed next bugzilla by a problem to occur for the difference of the timing of the attribute update by attrd before. * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528 We can evade this problem now by using crmd-transition-delay parameter. I confirmed whether I cou

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:41 pm, Brian J. Murrell (brian) wrote: > On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: >> >> The local cib hasn't caught up yet by the looks of it. > > Should crm_resource actually be [mis-]reporting as if it were > knowledgeable when it's not though? IOW is t

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:52 pm, renayama19661...@ybb.ne.jp wrote: > Hi All, > > I contributed next bugzilla by a problem to occur for the difference of the > timing of the attribute update by attrd before. > * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528 > > We can evade this pro

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread renayama19661014
Hi Andrew, Thank you for comments. > Are you using the new attrd code or the legacy stuff? I use new attrd. > > If you're not using corosync 2.x or see: > >     crm_notice("Starting mainloop..."); > > then its the old code.  The new code could also be used with CMAN but isn't > configured t

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 4:13 pm, renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > Thank you for comments. > >> Are you using the new attrd code or the legacy stuff? > > I use new attrd. And the values are not being sent to the cib at the same time? > >> >> If you're not using corosync 2.x or

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread renayama19661014
Hi Andrew, > >> Are you using the new attrd code or the legacy stuff? > > > > I use new attrd. > > And the values are not being sent to the cib at the same time? As far as I looked. . . When the transmission of the attribute of attrd of the node was late, a leader of attrd seemed to send an a

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 4:33 pm, renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > Are you using the new attrd code or the legacy stuff? >>> >>> I use new attrd. >> >> And the values are not being sent to the cib at the same time? > > As far as I looked. . . > When the transmission of the att

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrey Groshev
14.01.2014, 07:47, "Andrew Beekhof" : > Ok, here's what happens: > > 1. node2 is lost > 2. fencing of node2 starts > 3. node2 reboots (and cluster starts) > 4. node2 returns to the membership > 5. node2 is marked as a cluster member > 6. DC tries to bring it into the cluster, but needs to cancel