Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
On Tue, Jan 4, 2011 at 1:29 PM, Dimitri Maziuk wrote: > Igor Chudov wrote: > >> At this point I feel rather desperate. Perhaps I should give "pacemaker" >> another go. I really have no idea and I am running out of options. > > If all you need is a 2-node active-passive cluster, most (all?) > pacem

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Dimitri Maziuk
Igor Chudov wrote: > At this point I feel rather desperate. Perhaps I should give "pacemaker" > another go. I really have no idea and I am running out of options. If all you need is a 2-node active-passive cluster, most (all?) pacemaker features are useless for you. (Besides, one look at their

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
On Tue, Jan 4, 2011 at 9:14 AM, Igor Chudov wrote: > Serge, I am not sure of anything, but the self-communication is supposed to > be taking place on a single crossover cable between second network cards of > the servers. (eth1). Agree, yet something strange and pretty unique is going on with you

Re: [Linux-HA] pingd resource problem

2011-01-04 Thread Dejan Muhamedagic
Hi, On Thu, Dec 30, 2010 at 04:52:38PM +0100, Nico Faerber wrote: > Salute > > I have some troubles setting up a pingd clone resource. > I'm using pacemaker 1.0.8 with corosync 1.2.0 running on a ubuntu 10.04. > > after setting up the resource crm/configure/show gives this: > > primitive pingd

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Igor Chudov
Serge, I am not sure of anything, but the self-communication is supposed to be taking place on a single crossover cable between second network cards of the servers. (eth1). Igor On Tue, Jan 4, 2011 at 10:06 AM, Serge Dubrouski wrote: > Are you sure that everything is all right with your network

Re: [Linux-HA] Admin of heartbeat 2.13 on Debian Lenny is a PITA

2011-01-04 Thread Ryan Kish
> Right now my only recourse is one of these options: > a) Install Ubuntu 8.04 in a VM and use heartbeat-gui > Have you installed the package heartbeat-2-gui? It provides /usr/lib/heartbeat-gui/haclient.py which is called heartbeat-gui in Ubuntu. -Ryan ___

Re: [Linux-HA] Config sanity check

2011-01-04 Thread Dejan Muhamedagic
Hi, On Thu, Dec 30, 2010 at 08:56:09PM +, James Smith wrote: > Hi, > > I've been hitting some problems with my drbd / iscsi-target clusters, > resources > dropping in to FAILED (unmanaged) states etc. I'm after a bit of a sanity > check > on the config below. > > Firstly, I know the times

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
Are you sure that everything is all right with your network? It looks like processes that are responsible for UDP communications are taking too much of CPU time. On Tue, Jan 4, 2011 at 8:47 AM, Igor Chudov wrote: > Steve, here's some data. > > The OS is Ubuntu 10.04. > > ~# apt-cache policy heart

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Igor Chudov
On Tue, Jan 4, 2011 at 9:40 AM, Serge Dubrouski wrote: > Which OS? > > Ubuntu 10.04 Lucid. > Which version of Hearbeat? > > 3.0.3 ~# apt-cache policy heartbeat heartbeat: Installed: 1:3.0.3-1ubuntu1 Candidate: 1:3.0.3-1ubuntu1 Version table: *** 1:3.0.3-1ubuntu1 0 - PID of which of H

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Igor Chudov
Steve, here's some data. The OS is Ubuntu 10.04. ~# apt-cache policy heartbeat heartbeat: Installed: 1:3.0.3-1ubuntu1 Candidate: 1:3.0.3-1ubuntu1 Version table: *** 1:3.0.3-1ubuntu1 0 500 http://us.archive.ubuntu.com/ubuntu/ lucid/universe Packages 100 /var/lib/dpkg/status

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Dejan Muhamedagic
Hi, On Tue, Jan 04, 2011 at 07:47:10AM -0600, Igor Chudov wrote: > Further reading indicates that heartbeat itself sets a limit for itself > every so often. True. > Then it exceeds the limit (probably due to a bug). I am sure that tha's why > whoever wrote heartbeat, set cpu limit, instead of fo

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
Which OS? Which version of Hearbeat? - PID of which of Heartbeat processes? It has several. On Tue, Jan 4, 2011 at 6:32 AM, Igor Chudov wrote: > A few weeks I reported that heartbeat died on one of the cluster machines, > due to SIGXCPU. > > Well, it happened again. Heartbeat died, now both

Re: [Linux-HA] Admin of heartbeat 2.13 on Debian Lenny is a PITA

2011-01-04 Thread Dejan Muhamedagic
On Tue, Jan 04, 2011 at 01:14:39PM +0100, Tobias Appel wrote: > On 01/04/2011 12:31 PM, Imran Chaudhry wrote: > > Hi List, > > > > Has anyone found a good solution to administering an established > > 2-node cluster running heartbeat 2.13 on Debian Lenny? > > > I have 2.1.4 on RHEL5 still running. I

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Steve Davies
On 4 January 2011 13:47, Igor Chudov wrote: > Further reading indicates that heartbeat itself sets a limit for itself > every so often. > > Then it exceeds the limit (probably due to a bug). I am sure that tha's why > whoever wrote heartbeat, set cpu limit, instead of foxing their bugs. > > Then i

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Igor Chudov
Further reading indicates that heartbeat itself sets a limit for itself every so often. Then it exceeds the limit (probably due to a bug). I am sure that tha's why whoever wrote heartbeat, set cpu limit, instead of foxing their bugs. Then it dies with SIGXCPU, leaving everything in an extremely m

[Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Igor Chudov
A few weeks I reported that heartbeat died on one of the cluster machines, due to SIGXCPU. Well, it happened again. Heartbeat died, now both machines had the shared IP address up, what a god awful mess!!! Nopw they have split brain and the whole nine yards! I looked at /proc//limits and found:

Re: [Linux-HA] Admin of heartbeat 2.13 on Debian Lenny is a PITA

2011-01-04 Thread Tobias Appel
On 01/04/2011 12:31 PM, Imran Chaudhry wrote: > Hi List, > > Has anyone found a good solution to administering an established > 2-node cluster running heartbeat 2.13 on Debian Lenny? > I have 2.1.4 on RHEL5 still running. It also has the GUI (although it can be dangerous). > b) Save the CIB XML,

Re: [Linux-HA] Admin of heartbeat 2.13 on Debian Lenny is a PITA

2011-01-04 Thread Michael Schwartzkopff
On Tuesday 04 January 2011 12:31:14 Imran Chaudhry wrote: > Hi List, > > Has anyone found a good solution to administering an established > 2-node cluster running heartbeat 2.13 on Debian Lenny? No, since version 2.1.3 is extremly buggy. Please consider using pacemaker from the backports. -- D

[Linux-HA] Admin of heartbeat 2.13 on Debian Lenny is a PITA

2011-01-04 Thread Imran Chaudhry
Hi List, Has anyone found a good solution to administering an established 2-node cluster running heartbeat 2.13 on Debian Lenny? I want to rename a resource and add another virtual IP to the cluster with minimum disruption. I have tried various things including the DRBD-MC which is in Beta, not t

[Linux-HA] ha for 2 jboss instancies in an active/passive cluster

2011-01-04 Thread Erik Dobák
Hi i am a completely newbie, but it seems that i will have to use Heartbeat to solve my problem. I have to install 2 jboss instancies on 1 server (multihoming/ vertical cluster) and unfortunataly the aplication inside jboss cant be run in an active/active cluster. now i have found this: http://w

Re: [Linux-HA] Is 'resource_set' still experimental?

2011-01-04 Thread Tobias Appel
On 12/28/2010 06:46 PM, Dejan Muhamedagic wrote: > > 40 order constraints? A big cluster. > We have currently 40 VM's (XEN) on it. I can't put them in a group since they have to run independently and not necessarily on the same node(s). To make it worse I also have location constraints and addi