I have set up cron jobs on both servers. I restart heartbeat at 22 hours on
one box and at 23 hours on another. It's been 4 days and so far, so good. I
will report more result. This could be an ugly solution to an ugly problem,
but workable.
i
___
Linux-
Igor Chudov wrote:
> My second question is, can heartbeat be configured to restart itself in case
> of such a failure.
Usually you can't have X restart itself after X dies. You need some kind
of Y.
If you're running snmpd, see if you can get "proc" to identify
"heartbeat: master control proces
On Tue, Jan 4, 2011 at 10:22 AM, Serge Dubrouski wrote:
> On Tue, Jan 4, 2011 at 9:14 AM, Igor Chudov wrote:
> > Serge, I am not sure of anything, but the self-communication is supposed
> to
> > be taking place on a single crossover cable between second network cards
> of
> > the servers. (eth1)
On Tue, Jan 4, 2011 at 1:29 PM, Dimitri Maziuk wrote:
> Igor Chudov wrote:
>
>> At this point I feel rather desperate. Perhaps I should give "pacemaker"
>> another go. I really have no idea and I am running out of options.
>
> If all you need is a 2-node active-passive cluster, most (all?)
> pacem
Igor Chudov wrote:
> At this point I feel rather desperate. Perhaps I should give "pacemaker"
> another go. I really have no idea and I am running out of options.
If all you need is a 2-node active-passive cluster, most (all?)
pacemaker features are useless for you. (Besides, one look at their
On Tue, Jan 4, 2011 at 9:14 AM, Igor Chudov wrote:
> Serge, I am not sure of anything, but the self-communication is supposed to
> be taking place on a single crossover cable between second network cards of
> the servers. (eth1).
Agree, yet something strange and pretty unique is going on with you
Serge, I am not sure of anything, but the self-communication is supposed to
be taking place on a single crossover cable between second network cards of
the servers. (eth1).
Igor
On Tue, Jan 4, 2011 at 10:06 AM, Serge Dubrouski wrote:
> Are you sure that everything is all right with your network
Are you sure that everything is all right with your network? It looks
like processes that are responsible for UDP communications are taking
too much of CPU time.
On Tue, Jan 4, 2011 at 8:47 AM, Igor Chudov wrote:
> Steve, here's some data.
>
> The OS is Ubuntu 10.04.
>
> ~# apt-cache policy heart
On Tue, Jan 4, 2011 at 9:40 AM, Serge Dubrouski wrote:
> Which OS?
>
>
Ubuntu 10.04 Lucid.
> Which version of Hearbeat?
>
>
3.0.3
~# apt-cache policy heartbeat
heartbeat:
Installed: 1:3.0.3-1ubuntu1
Candidate: 1:3.0.3-1ubuntu1
Version table:
*** 1:3.0.3-1ubuntu1 0
- PID of which of H
Steve, here's some data.
The OS is Ubuntu 10.04.
~# apt-cache policy heartbeat
heartbeat:
Installed: 1:3.0.3-1ubuntu1
Candidate: 1:3.0.3-1ubuntu1
Version table:
*** 1:3.0.3-1ubuntu1 0
500 http://us.archive.ubuntu.com/ubuntu/ lucid/universe Packages
100 /var/lib/dpkg/status
Hi,
On Tue, Jan 04, 2011 at 07:47:10AM -0600, Igor Chudov wrote:
> Further reading indicates that heartbeat itself sets a limit for itself
> every so often.
True.
> Then it exceeds the limit (probably due to a bug). I am sure that tha's why
> whoever wrote heartbeat, set cpu limit, instead of fo
Which OS?
Which version of Hearbeat?
- PID of which of Heartbeat processes? It has several.
On Tue, Jan 4, 2011 at 6:32 AM, Igor Chudov wrote:
> A few weeks I reported that heartbeat died on one of the cluster machines,
> due to SIGXCPU.
>
> Well, it happened again. Heartbeat died, now both
On 4 January 2011 13:47, Igor Chudov wrote:
> Further reading indicates that heartbeat itself sets a limit for itself
> every so often.
>
> Then it exceeds the limit (probably due to a bug). I am sure that tha's why
> whoever wrote heartbeat, set cpu limit, instead of foxing their bugs.
>
> Then i
Further reading indicates that heartbeat itself sets a limit for itself
every so often.
Then it exceeds the limit (probably due to a bug). I am sure that tha's why
whoever wrote heartbeat, set cpu limit, instead of foxing their bugs.
Then it dies with SIGXCPU, leaving everything in an extremely m
A few weeks I reported that heartbeat died on one of the cluster machines,
due to SIGXCPU.
Well, it happened again. Heartbeat died, now both machines had the shared IP
address up, what a god awful mess!!!
Nopw they have split brain and the whole nine yards!
I looked at /proc//limits and found:
15 matches
Mail list logo