Re: [Linux-cluster] A home-grown cluster

2009-10-30 Thread Branimir
Branimir wrote: Hi list ;) Well, here is my problem. I configured a few productions clusters myself - mostly HP Proliant machines with ILO/ILO2. Now, I would like to do the same thing but with ordinary PC hardware (the fact is my wife wants me to reduce the number of my physical machines ;)). I

Re: [Linux-cluster] Xen network config -> Fence problem - More info

2009-10-30 Thread Madison Kelly
After sending this, I went back to debugging the problem. The machines had stopped fencing and the DRBD link was down. So first I stopped and then started 'xend' and this got the Xen-type networking up. I left the machines alone for about ten minutes to see if they would fence one another,

[Linux-cluster] Xen network config -> Fence problem

2009-10-30 Thread Madison Kelly
Hi all, I've got CentOS 5.3 installed on two nodes (simple two node cluster). On this, I've got a DRBD partition running cluster aware LVM. I use this to host VMs under Xen. I've got a problem where I am trying to use eth0 as a back channel for the VMs on either node via a firewall VM. T

Re: [Linux-cluster] GFS2 processes getting stuck in WCHAN=dlm_posix_lock

2009-10-30 Thread Dustin Henry Offutt
This sounds like a memory problem from the mail app or OS that runs into the cluster software. Trace running memory heaps in the dump. On Fri, Oct 30, 2009 at 6:27 PM, Allen Belletti wrote: > Hi All, > > As I've mentioned before, I'm running a two-node clustered mail server on > GFS2 (with RHEL 5

[Linux-cluster] GFS2 processes getting stuck in WCHAN=dlm_posix_lock

2009-10-30 Thread Allen Belletti
Hi All, As I've mentioned before, I'm running a two-node clustered mail server on GFS2 (with RHEL 5.4) Nearly all of the time, everything works great. However, going all the way back to GFS1 on RHEL 5.1 (I think it was), I've had occasional locking problems that force a reboot of one or bot

RE: [Linux-cluster] GS2 try_rgrp_unlink consuming lots of CPU

2009-10-30 Thread Miller, Gordon K
Hi, We are still struggling with the problem of try_rgrp_unlink consuming large amounts of CPU time over durations exceeding 15 minutes. We see several threads on the same node repeatedly calling try_rgrp_unlink with the same rgrp and the same group of inodes being retuned over and over until 15

Re: [Linux-cluster] Progress OpenEdge and RHCS

2009-10-30 Thread Lon Hohberger
On Fri, 2009-10-30 at 14:33 -0400, Lon Hohberger wrote: > On Wed, 2009-10-28 at 10:17 +0100, Bohdan Sydor wrote: > > Hi all, > > > > My customer's goal is to run Progress OpenEdge 10.2A on RHEL 5.4. It > > should be a HA service with two nodes and FC-connected shared storage. > > I've reviewed the

Re: [Linux-cluster] Progress OpenEdge and RHCS

2009-10-30 Thread Lon Hohberger
On Wed, 2009-10-28 at 10:17 +0100, Bohdan Sydor wrote: > Hi all, > > My customer's goal is to run Progress OpenEdge 10.2A on RHEL 5.4. It > should be a HA service with two nodes and FC-connected shared storage. > I've reviewed the group archive and googled too, but didn't find > anything suitable.

Re: [Linux-cluster] service state unchanged when host crashes

2009-10-30 Thread Giacomo Bagnoli
On Fri, 2009-10-30 at 12:52 -0400, Lon Hohberger wrote: > FYI, this was fixed in March: > > http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=aee97b180e80c9f8b90b8fca63004afe3b289962 > > -- Lon Ups, didn't notice that, I did look at the git log before opening the bug but I must have mis

RE: [Linux-cluster] some questions about rgmanager

2009-10-30 Thread Lon Hohberger
On Mon, 2009-10-26 at 11:05 +, Martin Waite wrote: > Hi Brem, > > Thanks for the pointers. > > The link to "OCF RA API Draft" appears to answer my questions. It will take > a while to digest all that. Note that rgmanager doesn't implement 'monitor' (uses 'status' instead) as required by

RE: [Linux-cluster] service state unchanged when host crashes

2009-10-30 Thread Lon Hohberger
On Wed, 2009-10-28 at 10:24 +, Martin Waite wrote: > Hi, > > This does look like the same problem: > > mar...@clusternode27:~$ sudo /usr/sbin/cman_tool -f nodes > Node Sts Inc Joined Name > 27 M 44 2009-10-27 14:59:33 clusternode27 > 28 M 52 2009-10-27

Re: [Linux-cluster] service state unchanged when host crashes

2009-10-30 Thread Lon Hohberger
On Wed, 2009-10-28 at 14:54 +0100, Jakov Sosic wrote: > On Wed, 28 Oct 2009 11:10:07 - > "Martin Waite" wrote: > > > Given the nature of the bug, does this mean that the unpatched > > cluster code is unable to relocate services in the event of node > > failure ? > > Yes, that means exactly t

Re: [Linux-cluster] A home-grown cluster

2009-10-30 Thread Lon Hohberger
On Wed, 2009-10-28 at 19:44 -0400, Madison Kelly wrote: > You know, I've been wondering the same thing now for a little while and > had figured it just wasn't possible. I had seen a design a couple of > years ago for a serial to reset switch home-brew fence device but I've > not been able to fi

Re: [Linux-cluster] ccs_config_validate in cluster 3.0.X

2009-10-30 Thread Guido Günther
On Wed, Oct 28, 2009 at 11:36:30AM +0100, Fabio M. Di Nitto wrote: > Hi everybody, > > as briefly mentioned in 3.0.4 release note, a new system to validate the > configuration has been enabled in the code. > > What it does > > > The general idea is to be able to perform as many sani

Re: [Linux-cluster] post fail fencing

2009-10-30 Thread Kaloyan Kovachev
On Fri, 30 Oct 2009 10:03:41 -0500, David Teigland wrote > On Thu, Oct 29, 2009 at 07:13:04PM +0200, Kaloyan Kovachev wrote: > > Hello, > > i would like to have one specific node to always fence any other failed > > node > > and some nodes to never try to fence. For example in 4 or 5 nodes cluste

Re: [Linux-cluster] post fail fencing

2009-10-30 Thread David Teigland
On Thu, Oct 29, 2009 at 07:13:04PM +0200, Kaloyan Kovachev wrote: > Hello, > i would like to have one specific node to always fence any other failed node > and some nodes to never try to fence. For example in 4 or 5 nodes cluster: > Node1 is fencing any other failed, Node2 and Node3 will try fenci