Hi
In the RHEL5 cluster.conf doc :
http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html
it is written :
Tag: Per Node configuration
Parent Tag:
Attributes:
* name(Required): The hostname or IP Address of the node
likewie with CS4, but I tried to set IP address instead of hostna
Hi
I think I remember that with CS4, it was possible
to set IP addr instead of node name in cluster.conf
such as :
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi Lon
and so ... ? ;-)
Regards
Alain Moullé
Date: Tue, 10 Jun 2008 14:37:19 -0400
From: Lon Hohberger <[EMAIL PROTECTED]>
>>Hi Lon,
>>> Whereas heart-beat interface was working fine.
>>> You can disable these by setting allow_kill="0" and/or reboot="0"
>>> (see qdisk(5)).
>>
>>
>> => ok but in
Hi
Is it supported to use IP bonded adress as IP to
be failovered via the CS5 ?
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi Lon,
and thanks again for all your answers.
>>Whereas heart-beat interface was working fine.
>>You can disable these by setting allow_kill="0" and/or reboot="0"
>>(see qdisk(5)).
=> ok but in the case of a heart-beat failure, it will no more
avoid the dual-fencing if allow_kill="0" and/or rebo
Hi
One thing bothers me again :
I have this record in cluster.conf :
where 172.20.0.110 is a third machine not in my cluster pair node1/node2
My last understanding was that quorum disk was NOT a redundancy of heart-beat,
meaning that if heart-beat interface fails, there is a failover but
Hi
About my problem of node entering a loop :
Jun 3 15:54:49 [EMAIL PROTECTED] qdiskd[22256]: Writing eviction
notice for node 1
Jun 3 15:54:50 [EMAIL PROTECTED] qdiskd[22256]: Node 1 evicted
Jun 3 15:54:51 [EMAIL PROTECTED] qdiskd[22256]: Node 1 is undead.
I notice that just before enteri
Hi
About my problem of node entering a loop :
Jun 3 15:54:49 [EMAIL PROTECTED] qdiskd[22256]: Writing eviction
notice for node 1
Jun 3 15:54:50 [EMAIL PROTECTED] qdiskd[22256]: Node 1 evicted
Jun 3 15:54:51 [EMAIL PROTECTED] qdiskd[22256]: Node 1 is undead.
I notice that just before enteri
Hi
With CS5, when the status of a service returns failed, the CS5 tries
to start three times the service , so we can see three start/stop
sequences if it does not start correctly each time. The following
start is always launchec just after the stop,
is there a tunable timer between the three start
Hi
With CS5 :
Is there always a link to the value to set for :
DLM_LOCK_TIMEOUT
if the token default is modified in cluster.conf
(with CS4, the modification of deadnode_timer was
to be linked to a modification of the DLM_LOCK_TIMEOUT)
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing li
Hi
What can be the causes of this message during a relocate of service ?
#60: Mangled reply from member #1 during RG relocate
Consequence is that the service remains "starting" and never goes "started".
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
htt
Hi Lon
I've applied the patch (see resulting code below) but the patch
does not solve the problem.
Is there another patch linked to this problem ?
Thanks
Regards
Alain Moullé
>> when testing a two-nodes cluster with quorum disk, when
>> I poweroff the node1 , node 2 fences well the node 1 and
>
Hi
With CS5, is there always the possibility to set cluster_id
in cluster.conf : https://www.redhat.com/mailman/listinfo/linux-cluster
Hi Lon
OK it seems I miss some big evolutions with CS5 versus CS4 ...
Where can I find a short documentation (or all documentation)
to understand all evolutions of CS5 , like openais , etc. ?
Thanks
Regards
Alain Moullé
On Tue, 2008-05-20 at 17:22 +0200, Alain Moulle wrote:
>>
Hi Lon
Something bothers me about the CS5 defaut heart-beat timeout :
you wrote that it was now default 5s instead of 21s with CS4.
So : what is the new default period for HELLO messages ? because
it was also 5s with CS4 ...
And a strange thing : I have already tested several times
the failover w
Hi Lon
Sorry Lon, but it is not completely clear again for me ... :
when you write that default cman timeout on RHEL5 is 5 seconds, you
mean that the heart-beat timeout is 5s ? whereas each hello message is
sent every 5s too ?
And the totem in cluster.conf to modify it was in my understanding the
Hi Lon
Thans again, but that's strange because in the man , the recommended
values are :
intervall="1" tko="10" and so we have a result < 21s which is the
default value of heart-beat timer, so not a hair above like you
recommened in previous email ...
extract of man qddisk :
interval="1"
Hi
I don't remember the meaning of checkinterval value in
service record in cluster.conf with regard to the monitor
and status values in script.sh ?
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi
I 'm facing a problem :
when testing a two-nodes cluster with quorum disk, when
I poweroff the node1 , node 2 fences well the node 1 and
failovers the service, but in log of node 2 I have before and after
the fence success messages many messages like this:
Apr 24 11:30:04 [EMAIL PROTECTED] qd
Hi Lon
Thans again, but that's strange because in the man , the recommended
values are :
intervall="1" tko="10" and so we have a result < 21s which is the
default value of heart-beat timer, so not a hair above like you
recommened in previous email ...
extract of man qddisk :
interval="1"
Hi
Something strange with clustat :
if CS5 is launched with a valid Quorum Disk (that we can see
with mkqdisk -L) and if we break the quorum disk (i.e mkfs on
the device just to simulate a pb to reach the quorum disk), the
clustat command always displays the Quorum Disk "Online" :
#clustat
Membe
Hi Lon
Thans again, but that's strange because in the man , the recommended
values are :
intervall="1" tko="10" and so we have a result < 21s which is the
default value of heart-beat timer, so not a hair above like you
recommened in previous email ...
extract of man qddisk :
interval="1"
of one cyle, so
that we can adjust it a hair more than cman's timeout ?
Thanks for these details.
Regards
Alain Moullé
On Mon, 2008-04-07 at 15:00 +0200, Alain Moulle wrote:
>> Hi
>>
>> Is there a similar rule with CS5 ? I mean if we
>> increase the heart-beat timeo
Hi
Is there a similar rule with CS5 ? I mean if we
increase the heart-beat timeout, is there some
other parameters to adjust together ?
Thanks
Regards
Alain Moullé
Alain Moulle wrote:
>>>> Hi
>>>>
>>>> is there a rule to follow between the DLM lock_timeout
Hi
Which is the best way to monitor other eth networks than the one
for heart-beat ?
I don't think it's possible with heuristics if we have already
set two ping on the same network to check and avoid dual-fencing.
would it be to create a service with a status target which will
ping the eth netwo
Hi
In script.sh we can see these two lines :
what's the difference between "status" and "monitor" ?
I guess one is the periodic status call on services launched
by the Cluster Suite but which one ?
And what for the other ?
Thanks for your explanation.
Regards
Alain Moullé
Hi
Just to mention it :
with CS5, if the service name has more than 12 characters,
"clustat" cuts it at 12, whereas "clustat -x" displays
the full name correctly .
Is there an open bug on this ?
Thanks
Regards
Alain
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/
Thanks Chrissie
And is there a way to change the dlm lock_timeout with
a parameter in cluster.conf ?
Thanks
Regards
Alain
Alain Moulle wrote:
>> Hi
>>
>> is there a rule to follow between the DLM lock_timeout
>> and the deadnode_timeout value ?
>> Meaning for exam
Hi
is there a rule to follow between the DLM lock_timeout
and the deadnode_timeout value ?
Meaning for example that the first one must be always lesser than
the second one ?
And if so, could we have a deadnode_timeout=60s and the
/proc/cluster/config/dlm/lock_timeout at 70s ? or are
there some up
Hi
It seems that there is no more magma rpm in RHEL5, therefore
no more magma_tool ... is there an equivalent ? or a new rpm
to be installed to recover the magma_tool ?
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linu
Hi
What is the politic of service status request :
it seems that there is systematically a status request
just after the start of the service and then a periodic
status (depending on monitor variable in script.sh) but the
1st periodic status can be requested between 1 and xx secunds
just after the
Hi
Just for information, I wonder if this behavior is normal :
I have a two-nodes cluster with a quorum disk, and the
CS5 is started on both nodes with a service on each one.
Quorum is working fine when I break the quorum disk format
(with a mkfs on the device !) so that mkqisk -L returns
none.
Th
Hi
recall : two-nodes cluster with Quorum Disk
Back to my problem of bad rgmanager behavior after a test of
ifdown ethO (heart-beat interface) , I recall that after the
reboot of the node, the rgmanager does not respond and clustat
does not display any information from rgmanager on this node,
and
Hi
My rpm releases are currently :
cman-2.0.73-1.el5
rgmanager-2.0.31-1.el5
system-config-cluster-1.0.50-1.3.noarch.rpm
and
perl-Net-Telnet-3.03-5.noarch.rpm
openais-0.80.3-7.el5.x86_64.rpm
I just wonder if I am up to date or if there
are already some new releases available ?
and if so what are t
On Fri, 2008-01-11 at 15:19 +0100, Alain Moulle wrote:
>> Hi
>>
>> On my two-nodes cluster with qdiskd :
>> when testing CS5 via a ifdown eth0 on node2(where is the heart-beat)
>> I have a strange behavior : the node2 is rebooted and service
>> is failovered
Hi
On my two-nodes cluster with qdiskd :
when testing CS5 via a ifdown eth0 on node2(where is the heart-beat)
I have a strange behavior : the node2 is rebooted and service
is failovered by node1, fine. But after the reboot of node2 and
re-launch of CS 4 daemons, I can't see via clustat any informa
Hi
Testing the CS5 on a two-nodes cluster with quorum disk, when I did
the test ifdown on the heart-beat interface, I got a segfault in log :
Jan 9 09:45:16 [EMAIL PROTECTED] avahi-daemon[3106]: Interface eth0.IPv6 no
longer
relevant for mDNS.
Jan 9 09:45:18 [EMAIL PROTECTED] qdiskd[28265]: H
Hi
I thought there was always a dlm and dlm-kernel likewise in CS4,
but it seems rpms don't exist anymore ?
So no more kernel module at all with CS5 ?
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi Lon,
Finally, I adopt this quorum disk configuration :
I just wonder if the interval values for quorum disk with regard to
the one for heuristic is the best choice or not ?
And which are the rules to fit the good value for interval and tko
on heuristic ? (I don't completely
Thanks again
Alain
From: Lon Hohberger <[EMAIL PROTECTED]>
Subject: Re: [Linux-cluster] Re: CS5 two-nodes with quorum disk
To: linux clustering
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain
On Mon, 2007-12-17 at 10:36 +0100, Alain Moulle wrote:
>> Hi Lon
>>
>>
Hi Lon
Me again, always fighting with my two node cluster with quorum disk ;-)
I've not yet received your response to my last email if you have already
answered, because I 'm in digest mode, so , sorry if ...
After again re-read lots of texts,FAQs, and emails from this ML about quorum
disk on tw
Hi Lon
I've carefully read your last detailed information. I've a
better understanding but something is again not clear for me :
in my two node cluster node1/node2, with quorum disk , without any heuristic,
I would like to be sure that if there is a failure on the heart-beat
network, only one node
Hi Lon
I've carefully read your last detailed information. I've a
better understanding but something is again not clear for me :
in my two node cluster node1/node2, with quorum disk , without any heuristic,
when I do on node2 ifdown on eth if of heart-beat, what is the
mechanism via the quorum dis
D]
On Behalf Of Alain Moulle
Sent: Monday, December 10, 2007 10:06
To: linux-cluster@redhat.com
Subject: [Linux-cluster] CS5 can't stop service cman
Hi
I've got quite systematically this problem when stopping cman service :
# service cman stop
Stopping cluster:
Stopping fencing... done
Hi Lon
Thanks for your information about votes values with quorumd.
Another question about my tests :
Now I have the quorum disk working correctly, and so I wanted
to do this test : ifdown on the heart beat interface, to simulate
a heart beat network breakdown. I expected the cluster NOT to failo
Hi
I've got quite systematically this problem when stopping cman service :
# service cman stop
Stopping cluster:
Stopping fencing... done
Stopping cman... failed
/usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
[F
Hi
I'm fighting about all possibilities between quorumd votes and
cman expected votes values so that my two-nodes cluster works
fine. I've read lots of emails on this ML about that and the
FAQ etc. but among some contradictions, and perhaps my misunderstanding
of documentation (qdisk man etc.) , I
Hi
OK I've found the problem, there was an alias am2 on local host line
in /etc/hosts.
Sorry for disturb.
Alain
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi
Some new stuff : I've added the "return" just at the
begining of check_xml function in CommandHandler.py
and now GUI works fine with my cluster.conf .
But even when saving again as a new cluster.conf
I always got the same error about local node name
not found in cluster.conf.
Alain
Hi
Thanks
Hi
Thanks Patrick for information.
And sorry about first message :
cman not started: CCS does not have a nodeid for this node, run 'ccs_tool
> addnodeids' to fix /usr/sbin/cman_tool: aisexec daemon didn't start
It was effectively clear enough and I had fixed the problem, I just mismatch
the error
Hi
I'm stalled on :
# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed
cman not started: Can't find local node name in cluster.conf
/usr/sbin/cman_tool: aisexec daemon didn't start
[FAI
Hi
My first steps on CS5 ...
When I do:
service cman start, I got :
# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed
cman not started: CCS does not have a nodeid for this node, run 'ccs_tool
addnodei
Hi
We are currently trying to set up a HA NFS Sever (CS4u4, active/active). As
decribed in Cookbook we use "managed nfs service" fonctionnality with ext3 FS
type. During failover process (relocate command) we have the following problem :
On server side everything seems to be OK (exports, IP, mount
Hi
I don't really know if it is a bug, but the problem occurs when
you a DNS is used, and when a node must be reachable from two
different networks but with the same name on both networks; in this
case, the gethostbyname() returns (from DNS) more than one adress for
a Node name. So it seems that th
Hi Patrick
you mean like this in cluster.conf :
...
???
and if so, we should use "cman_tool join -d -n 192.168.1.2" instead
of "service cman start"
Is this right ?
Thanks
Regards
Alain
>
Hi
CS/cman refuse to configure/start (cman_tool -w join) if gethostbyname() returns
(from DNS) more than one adress for a Node name.
Is there a workaround for this ?
Why not allow IP-addresses to be specified in CS configuration-files/cmd-lines
instead of names to work-around the problem ?
Than
Hi,
I can't find rpms on DVD RHEL5 U1 Gold for EM64T, neither on Supplementary DVD.
Where are CS5 rpms ?
Thanks
Regards
Alain Moullé
>>On Mon, Jul 16, 2007 at 03:03:24PM +0200, Alain Moulle wrote:
>>>> Is there a CS5 planned for RHEL5 ?
>>There is a cluster suite
Hi
In the cookbook NFS HA , the cluster.conf is with
force_unmount="0" . Is there a good reason not
to set it to 1 ?
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi
In a HA pair node1 & node2, after node1 crashes, I have this
information in node2 syslog :
Oct 30 21:32:54 [EMAIL PROTECTED] clurgmgrd[14522]: #48: Unable to
obtain cluster lock: Connection timed out
I wonder what it means exactly ?
And what are the consequences ?
Thanks
Alain Moullé
--
Hi
Is the "deadnode_timer" field managed in CS4 U4 ? CS4 U5 ? all versions ?
Because I don't find this field described in the cluster.conf doc :
http://sources.redhat.com/cluster/doc/cluster_schema.html
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://w
Hi
I don't remember if the modification to include directly
the cluster_id value in cluster.conf is available on CS4 U4
or only on CS4 U5 ?
Thanks
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi
When CS4 performs its status on service, is there a timeout value
so that if the status does not return any response, the CS4 decides
that the status is KO ?
Thanks
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi
I wonder if we can use heuristic fonctions without a real quorum disk working ?
Even if the record quorumd in cluster.conf is mandatory, perhaps using vote=0
for quorumd, just to have the benefit of heuristics ...
The goal is to monitor 3 networks in addition to the Heart-beat network
and to f
Thanks Marc and Jos for your pieces of advice, but
it does not seems to work:
I tried your first suggestion with qdisk votes=2 and expected_votes=3:
...
...
...
and I can't start cman on only one node, it needs cman on second node to be
started, and I don't un
Hi
As said before, I'm trying for the first time to
add a quorum disk on my two nodes cluster.
Finally, I've set parameters as below :
and
and
and
With these parameters values for my two nodes cluster, I have
to launch the Cluster Suite on both nodes,
Hi
I don't remember who recommended a few weeks ago to have the
U5 to use quorum disk but ... which main problems will I have
to use quorum disk with CS4 U4 ?
Thanks
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi
As said before, I'm trying for the first time to
add a quorum disk on my two nodes cluster.
Finally, I've set parameters as below :
and
and
and
With these parameters values for my two nodes cluster, I have
to launch the Cluster Suite on both nodes,
Hi
First time I will try the quorum disk functionnality ... which values
are recommended for quorumd parameters for a two nodes cluster ?
Is this correct ?
Thanks
Regards
Alain Moullé
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-
Hi
don't know exactly what's the meaning of "status" and "monitor"
(I mean the difference ?)
but I would like to increase the interval between each call of
the target "status" of the service's script .
Is it possible to modify this parameter via the cluster.conf ?
Thanks
Alain Moullé
--
Hi
Some questions about quorum disk :
1. is the quorum disk working correctly on CS4 Update 4 ?
or is there any known issue which could lead to problems ?
2. when you have two or three shared disk_array between two
HA nodes, is it needed to have a quorum disk each disk-array
or is one q
Hi
Is there any incompatibility to re-build the CS4 U5 for RHEL4 U4 ?
(Just for the benefit of all patches)
Thanks
Alain
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
71 matches
Mail list logo