[Linux-HA] Antw: Re: op monition on-fail option

2011-04-04 Thread Ulrich Windl
>>> Dejan Muhamedagic schrieb am 04.04.2011 um 14:51 in Nachricht <20110404125103.GK3553@squib>: > Hi, > > On Tue, Mar 22, 2011 at 06:39:08PM +0200, Pavlos Polianidis wrote: > > Dear all, > > > > I am looking for a way to add/modify an op-option "op monitor > on-fail="restart"" through a single

[Linux-HA] Antw: Re: Question about max_child_count

2011-04-04 Thread Ulrich Windl
>>> Dejan Muhamedagic schrieb am 04.04.2011 um 13:56 in Nachricht <20110404115618.GD3553@squib>: > Hi, > > On Mon, Apr 04, 2011 at 08:32:10AM +0200, Alain.Moulle wrote: > > Hi > > > > I got a strange message about "... max_child_count (4) reached, > > postponing execution of operation stop ..."

Re: [Linux-HA] Does heartbeat only use ping to check health of otherserver?

2011-04-04 Thread Greg Woods
On Mon, 2011-04-04 at 13:38 -0500, Neil Aggarwal wrote: > > crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip=192.168.9.101 cidr_netmask=32 \ > op monitor interval=30s > > Does that mean heartbeat is being used to detect > when to move the IP address to the standby server?

Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-04 Thread Lars Ellenberg
On Mon, Apr 04, 2011 at 09:43:27AM +0200, Andrew Beekhof wrote: > I am missing the state: running degraded or suboptimal. > >>> > >>> Yep, "degraded" is not a state available for pacemaker. > >>> Pacemaker cannot do much about "suboptimal". > Maybe we need to add OCF_RUNNING_

Re: [Linux-HA] Does heartbeat only use ping to check health of otherserver?

2011-04-04 Thread Neil Aggarwal
Greg: > you can set up ldirectord as a Pacemaker resource and let > Pacemaker handle the monitoring I have been reading the Pacemaker user guide. It says when adding an IP address to manager, use this command: crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=192.168.9.101 c

Re: [Linux-HA] Does heartbeat only use ping to check health of otherserver?

2011-04-04 Thread Neil Aggarwal
Greg: > You don't say whether or not you are using Pacemaker. > If you are, then you can set up ldirectord as a Pacemaker > resource and let Pacemaker handle the monitoring. Thanks for the extensive background info. It really helps me understand the layout of the individual pieces. I have not

Re: [Linux-HA] Does heartbeat only use ping to check health of otherserver?

2011-04-04 Thread Greg Woods
On Mon, 2011-04-04 at 11:44 -0500, Neil Aggarwal wrote: > >From what I can figure out from the ha.cf file, heartbeat > uses ping to tell if the peer is up. Not really. It uses special heartbeat packets to tell if the peer is up. Ping is used to tell the difference between a dead peer and a bad N

Re: [Linux-HA] Does heartbeat only use ping to check health of otherserver?

2011-04-04 Thread Dimitri Maziuk
Neil Aggarwal wrote: > I want to switch the virtual IP if the ldirectord process > is not running or locked up. That may happen even if the > network card is ok. > > Is there a way to do that? Depends on your config. E.g. snmpd: "proc" for ldirectord with "procfix" to run /usr/share/heartbeat

[Linux-HA] Does heartbeat only use ping to check health of otherserver?

2011-04-04 Thread Neil Aggarwal
Hello: I am trying to configure heartbeat to managed a shared virtual IP between load balancers. I used yum to install the heartbeat, heartbeat-ldirectord, and ipvsadm packages on a CentOS 5.5 server. >From what I can figure out from the ha.cf file, heartbeat uses ping to tell if the peer is

Re: [Linux-HA] op monition on-fail option

2011-04-04 Thread Dejan Muhamedagic
Hi, On Mon, Apr 04, 2011 at 05:51:28PM +0300, Pavlos Polianidis wrote: > I know that this is wrong but I found a workaround :) > > If you login as a user, which is a member of haclient group, without loading > its profile (su TEST instead su - TEST), and all users in haclient group have > the p

Re: [Linux-HA] op monition on-fail option

2011-04-04 Thread Pavlos Polianidis
I know that this is wrong but I found a workaround :) If you login as a user, which is a member of haclient group, without loading its profile (su TEST instead su - TEST), and all users in haclient group have the permission to write into /var/lib/heartbeat/crm then you can run the crm command a

Re: [Linux-HA] op monition on-fail option

2011-04-04 Thread Pavlos Polianidis
Dear Dejan The versions I have used are: pacemaker-1.0.10-1.4.el5 cluster-glue-1.0.6-1.6.el5 below are the actions I took: crm(live)options# user TEST crm(live)options# save crm(live)options# show editor "vim" pager "less" user "TEST" skill-level "expert" output "color" colorscheme "yellow,n

Re: [Linux-HA] Linux-HA Over WAN Advisable?

2011-04-04 Thread Dejan Muhamedagic
Hi, On Thu, Mar 31, 2011 at 10:23:32AM -0700, Robinson, Eric wrote: > Greetings! We have a few Corosync+PaceMaker+DRBD clusters and a couple > older Heartbeat+DRBD clusters. Our infrastructure is currently located > in a single facility. We have the opportunity to establish a DR site in > another

Re: [Linux-HA] op monition on-fail option

2011-04-04 Thread Dejan Muhamedagic
On Mon, Apr 04, 2011 at 03:58:06PM +0300, Pavlos Polianidis wrote: > Thank for the respond. > > What I did is in order to run the crm_resource command as a user (let's say > TEST) I added the user "TEST" to the haclient group. But the command "crm" is > refuses to run as TEST. I will try to add

Re: [Linux-HA] op monition on-fail option

2011-04-04 Thread Pavlos Polianidis
Thank for the respond. What I did is in order to run the crm_resource command as a user (let's say TEST) I added the user "TEST" to the haclient group. But the command "crm" is refuses to run as TEST. I will try to add a specific command to sudoers and see if it works Kind regards, Pavlos Pol

Re: [Linux-HA] op monition on-fail option

2011-04-04 Thread Dejan Muhamedagic
Hi, On Tue, Mar 22, 2011 at 06:39:08PM +0200, Pavlos Polianidis wrote: > Dear all, > > I am looking for a way to add/modify an op-option "op monitor > on-fail="restart"" through a single command besides using the crm cli. I > tried with crm_resource but no luck. I need the command to use it in

Re: [Linux-HA] Failover NFS using Pacemaker.

2011-04-04 Thread Caspar Smit
Hi, Thanks to all input, I finally managed to get a working NFS cluster using the exportfs RA and works like it should. Only i HAVE to mount the nfs share using UDP, because when I mount using TCP the connection is not reinstated after a failover. Is there a workaround for this so I can use TCP? W

[Linux-HA] Broadcast Strom from Heartbeat

2011-04-04 Thread Rainer Schwemmer
Dear all, I'm having a bit of peculiar problem with Heartbeat. I'm installing a new HA cluster with Heartbeat and pacemaker. Right now there is only one node installed, because I'm preparing the install image for the entire cluster. Anyway, since the last restart of the machine, heartbeat is in

Re: [Linux-HA] Question about max_child_count

2011-04-04 Thread Dejan Muhamedagic
On Mon, Apr 04, 2011 at 02:00:26PM +0200, Alain.Moulle wrote: > OK that's clear, but it sounds a little risky too to increase this > parameter ? That depends on the resources you run. Unfortunately, there's no way (yet) to assign "weight" to resources. BTW, you don't need to increase it if you c

Re: [Linux-HA] How to change by crm an "op" value ?

2011-04-04 Thread Dejan Muhamedagic
On Mon, Apr 04, 2011 at 01:56:07PM +0200, Alain.Moulle wrote: > Hi Dejan, > I had without doubt miss it because effectively your command works fine ... > > PS :just for my knowledge : what's the meaning of the "-" at the end ? It means "read the stdin". Need to update the doc. Thanks, Dejan >

Re: [Linux-HA] Question about max_child_count

2011-04-04 Thread Alain.Moulle
OK that's clear, but it sounds a little risky too to increase this parameter ? By the way I'm working with corosync, not heartbeat, so do you think it is all the same "tunable" ? Thanks Alain > Hi, > > On Mon, Apr 04, 2011 at 08:32:10AM +0200, Alain.Moulle wrote: > >> Hi >> >> I got a strange

Re: [Linux-HA] How to change by crm an "op" value ?

2011-04-04 Thread Alain.Moulle
Hi Dejan, I had without doubt miss it because effectively your command works fine ... PS :just for my knowledge : what's the meaning of the "-" at the end ? Thanks a lot Alain Dejan Muhamedagic a écrit : > Hi, > > On Wed, Mar 30, 2011 at 02:52:37PM +0200, Alain.Moulle wrote: > >> Hi, >> >> S

Re: [Linux-HA] Question about max_child_count

2011-04-04 Thread Dejan Muhamedagic
Hi, On Mon, Apr 04, 2011 at 08:32:10AM +0200, Alain.Moulle wrote: > Hi > > I got a strange message about "... max_child_count (4) reached, > postponing execution of operation stop ..." on a resource. > > What is the meaning of this max_child_count ? lrmd (the local resource manager) won't run

Re: [Linux-HA] NFS cluster after node crash

2011-04-04 Thread Christoph Bartoschek
Am 04.04.2011 10:32, schrieb Andrew Beekhof: > On Thu, Mar 24, 2011 at 9:58 PM, Christoph Bartoschek > wrote: >> It seems as if the g_nfs service is stopped on the surviving node when >> the other one comes up again. > > To me it looks like the service gets stopped after it fails: > > p_expo

Re: [Linux-HA] why Cluster "restarts" A, before starting B on surviving node.

2011-04-04 Thread Muhammad Sharfuddin
On Mon, 2011-04-04 at 10:42 +0200, Andrew Beekhof wrote: > On Thu, Mar 24, 2011 at 7:42 PM, Muhammad Sharfuddin > wrote: > > we have two resources A and B > > Cluster starts A on node1, and B on node2, while failover node for A is > > node2 and failover node for B is node1 > > > > B cant start wi

Re: [Linux-HA] Problem with an active/active NFS setup with exportfs RA

2011-04-04 Thread RaSca
Il giorno Sab 02 Apr 2011 19:04:08 CET, Alessandro Iurlano ha scritto: > On Fri, Apr 1, 2011 at 11:34 AM, RaSca wrote: >>> Then I tried to find a way to keep just the rmtab file synchronized on >>> both nodes. I cannot find a way to have pacemaker do this for me. Is >>> there one? >> As far as I k

Re: [Linux-HA] why Cluster "restarts" A, before starting B on surviving node.

2011-04-04 Thread Andrew Beekhof
On Thu, Mar 24, 2011 at 7:42 PM, Muhammad Sharfuddin wrote: > we have two resources A and B > Cluster starts A on node1, and B on node2, while failover node for A is > node2 and failover node for B is node1 > > B cant start without A, so I have following location rules: > >          order first_A_

Re: [Linux-HA] Stonith resource appears to be active on 2 nodes ...

2011-04-04 Thread Andrew Beekhof
On Mon, Apr 4, 2011 at 9:03 AM, Alain.Moulle wrote: > Hi, > I got this error : > 1301591983 2011 Mar 31 19:19:43 berlin5 daemon err crm_resource [36968]: > ERROR: native_add_running: Resource > stonith::fence_ipmilan:restofenceberlin4 appears to be active on 2 nodes. > 1301591983 2011 Mar 31 19:19

Re: [Linux-HA] NFS cluster after node crash

2011-04-04 Thread Andrew Beekhof
On Thu, Mar 24, 2011 at 9:58 PM, Christoph Bartoschek wrote: > It seems as if the g_nfs service is stopped on the surviving node when > the other one comes up again. To me it looks like the service gets stopped after it fails: p_exportfs_root:0_monitor_3 (node=laplace, call=12, rc=7, sta

Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-04 Thread Andrew Beekhof
On Fri, Apr 1, 2011 at 8:13 PM, Christoph Bartoschek wrote: > Am 01.04.2011 16:38, schrieb Lars Ellenberg: >> On Fri, Apr 01, 2011 at 11:35:19AM +0200, Christoph Bartoschek wrote: >>> Am 01.04.2011 11:27, schrieb Florian Haas: On 2011-04-01 10:49, Christoph Bartoschek wrote: > Am 01.04.20

[Linux-HA] Stonith resource appears to be active on 2 nodes ...

2011-04-04 Thread Alain.Moulle
Hi, I got this error : 1301591983 2011 Mar 31 19:19:43 berlin5 daemon err crm_resource [36968]: ERROR: native_add_running: Resource stonith::fence_ipmilan:restofenceberlin4 appears to be active on 2 nodes. 1301591983 2011 Mar 31 19:19:43 berlin5 daemon warning crm_resource [36968]: WARN: See ht

Re: [Linux-HA] 3+node clusters?

2011-04-04 Thread Stallmann, Andreas
Hi there, I asked the same question some time ago and received no suitable answer so far. DRBD [1] does no "proper" replication over three nodes; it's basically still a Two-Node-RAID-1 with a third node, which doesn't really take part in the cluster but receives replication data as kind of a "b