0 failed 0 seq 1,1
members 1 2
Now to determine why ls_slot is 0.
Neale
On 10/13/14, 5:10 PM, Neale Ferguson ne...@sinenomine.net wrote:
I put some debug code into the gfs2 module and I see it failing the mount
at this point:
/*
* If user space has failed to join the cluster
Yeah, I noted I was looking at the wrong lockspace. The gfs2 lockspace in
this cluster is vol1. Once I corrected at what I was looking at, I think I
solved my problem: I believe the problem is an endian thing. In
set_rcom_status:
rs-rs_flags = cpu_to_le32(flags)
However, in
I reported last week that I was getting permission denied when pcs was
starting a gfs2 resource. I thought it was due to the resource being
defined incorrectly, but it doesn¹t appear to be the case. On rare
occasions the mount works but most of the time one node gets it mounted
but the other gets
Software levels:
pacemaker-1.1.10-29
pcs-0.9.115-32
dlm-4.0.2-4
corosync-2.3.3-2
lvm2-cluster-2.02.105-14
On 10/13/14, 11:20 AM, Neale Ferguson ne...@sinenomine.net wrote:
I reported last week that I was getting permission denied when pcs was
starting a gfs2 resource. I thought it was due
Yep:
# pcs stonith show ZVMPOWER
Resource: ZVMPOWER (class=stonith type=fence_zvm)
Attributes: ipaddr=VSMREQIU
pcmk_host_map=rh7cn1.devlab.sinenomine.net:RH7CN1;rh7cn2.devlab.sinenomine.
net:RH7CN2
pcmk_host_list=rh7cn1.devlab.sinenomine.net;rh7cn2.devlab.sinenomine.net
Thanks Bob, answers inline...
On 10/13/14, 12:16 PM, Bob Peterson rpete...@redhat.com wrote:
- Original Message -
I would appreciate any debugging suggestions. I¹ve straced
dlm_controld/corosync but not gained much clarity.
Neale
Hi Neale,
1. What does it say if you try to mount
On 10/13/14, 1:58 PM, Bob Peterson rpete...@redhat.com wrote:
- Original Message -
(snip)
3. What kernel is this?
Make sure both nodes are running the same kernel, at any rate.
Both running 3.10.0-123.8.1
It was made via:
mkfs.gfs2 -j 2 -J 16 -r 32 -t rh7cluster:vol1
On 10/13/14, 1:58 PM, Bob Peterson rpete...@redhat.com wrote:
- Original Message -
Can you try with the standard 128MB journal size just as an experiment
to see if it mounts more consistently or if you get the same error?
Maybe GFS2's recovery code is sending an error back for some reason
I put some debug code into the gfs2 module and I see it failing the mount
at this point:
/*
* If user space has failed to join the cluster or some similar
* failure has occurred, then the journal id will contain a
* negative (error) number. This will then be returned to
an existing configuration?
Neale
On Oct 3, 2014, at 3:32 PM, Neale Ferguson ne...@sinenomine.net wrote:
Using the same two-node configuration I described in an earlier post this
forum, I'm having problems getting a gfs2 resource started on one of the
nodes. The resource in question:
Resource
That was the problem! I applied a local patch, rebuilt, restarted, and we're up
fine and dandy!
Thanks very much... Neale
On Oct 3, 2014, at 3:34 AM, Christine Caulfield ccaul...@redhat.com wrote:
I think you're hitting this bug:
Using the same two-node configuration I described in an earlier post this
forum, I'm having problems getting a gfs2 resource started on one of the nodes.
The resource in question:
Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/vg_cluster/ha_lv
After creating simple two node cluster, one node is being fenced continually.
I'm running pacemaker (1.1.10-29) with two nodes and the following
corosync.conf:
totem {
version: 2
secauth: off
cluster_name: rh7cluster
transport: udpu
}
nodelist {
node {
ring0_addr:
Forgot to include cib.xml:
cib epoch=17 num_updates=0 admin_epoch=0 validate-with=pacemaker-1.2
cib-last-written=Thu Oct 2 15:13:47 2014
update-origin=rh7cn1.devlab.sinenomine.net update-client=cibadmin
crm_feature_set=3.0.7 have-quorum=1
configuration
crm_config
Further to the problem described last week. What I'm seeing is that the node
(NODE2) that keeps going when NODE1 fails has many entries in dlm_tool
log_plocks output:
1410147734 lvclusdidiz0360 receive plock 10303 LK WR 0-7fff
1/8112/13d5000 w 0
1410147734 lvclusdidiz0360 receive
On Sep 8, 2014, at 11:17 AM, David Teigland teigl...@redhat.com wrote:
On Mon, Sep 08, 2014 at 02:44:49PM +, Neale Ferguson wrote:
1410147820 lvclusdidiz0360 store_plocks first 66307 last 88478 r_count 45
p_count 63 sig 5ab0
1410147820 lvclusdidiz0360 receive_plocks_stored 2:8 flags
wrote:
On Mon, Sep 08, 2014 at 03:35:05PM +, Neale Ferguson wrote:
The checkpoint data is sent to corosync/openais, which is responsible for
syncing that data to the other nodes, which should then be able to open
and read it. You'll also want to look for corosync/openais errors related
Will do. Having trouble accessing that system at the moment. I hope to get it
later today.
Neale
On Sep 3, 2014, at 12:32 PM, Tan Sri Dato' Eur Ing Adli white.he...@yahoo.com
wrote:
Well, gimme the other node log too.
Sent from my iPhone
On Sep 3, 2014, at 12:48 AM, Neale Ferguson
Hi,
In our two node system if one node fails, the other node takes over the
application and uses the shared gfs2 target successfully. However, after the
failed node comes back any attempts to lock files on the gfs2 resource results
in -ENOSYS. The following test program exhibits the problem -
Thanks Bob,
It's corosync - corosync-1.4.1-17, cman-3.0.12.1-60, fence-agents-3.1.5-26.
Neale
On Sep 2, 2014, at 11:04 AM, Bob Peterson rpete...@redhat.com wrote:
- Original Message -
Hi Neale,
For what it's worth: GFS2 just passes plock requests down to the cluster
Forget the snippet of code in my original posting as the code in 3.0.12-60
actually looks like this:
if (nodes_added(ls)) {
store_plocks(ls, sig);
ls-last_plock_sig = sig;
} else {
sig = ls-last_plock_sig;
}
Thanks David,
That makes sense as there's this message that precedes the disable message in
the log:
retrieve_plocks ckpt open error 12 lvclusdidiz0360
Neale
On Sep 2, 2014, at 11:37 AM, David Teigland teigl...@redhat.com wrote:
On Tue, Sep 02, 2014 at 02:56:52PM +, Neale Ferguson wrote
, 2014, at 12:02 PM, Neale Ferguson ne...@sinenomine.net wrote:
Thanks David,
That makes sense as there's this message that precedes the disable message in
the log:
retrieve_plocks ckpt open error 12 lvclusdidiz0360
Neale
On Sep 2, 2014, at 11:37 AM, David Teigland teigl...@redhat.com
The logs from the recovering node are attached. If you need the same from the
other node I will get them tonight.
On Sep 2, 2014, at 12:42 PM, David Teigland teigl...@redhat.com wrote:
We need to sort out which nodes are sending/receiving plock data to/from
each other. The way it's supposed
Hi,
In a two node cluster I shutdown one of the nodes and the other node notices
the shutdown but on rare occasions that node will then fence the node that is
shutting down. I assume Is this a situation where setting post_fail_delay would
be useful or setting the totem timeout to something
We have a sporadic situation where we are attempting to shutdown/restart both
nodes of a two node cluster. One shutdowns completely but one sometimes hangs
with:
[root@aude2mq036nabzi ~]# service cman stop
Stopping cluster:
Leaving fence domain... found dlm lockspace /sys/kernel/dlm/clvmd
Thanks Vinh. He is using IE8 (company policy!!). I've tried it with IE8, IE10,
Chrome, and Safari and all worked fine. He has cookies enabled so I'm at a loss
as to how that auth_stack_enabled setting is set/updated/cleared.
Neale
On Apr 25, 2014, at 5:37 PM, Cao, Vinh vinh@hp.com wrote:
luci-0.26.0-48 (tried -13 as well)
TurboGears2-2.0.3-4.
kernel-2.6.32-358.2.1
python-repoze-who-1.0.18-1 (I believe - am verifying)
On Apr 29, 2014, at 11:18 AM, Jan Pokorný jpoko...@redhat.com wrote:
could you be more specific as to which versions of luci, TurboGears
and repoze.who? In
to date but I'll
ask.
Neale
On Apr 29, 2014, at 1:17 PM, Jan Pokorný jpoko...@redhat.com wrote:
On 29/04/14 16:02 +, Neale Ferguson wrote:
luci-0.26.0-48 (tried -13 as well)
TurboGears2-2.0.3-4.
kernel-2.6.32-358.2.1
python-repoze-who-1.0.18-1 (I believe - am verifying)
Thanks
Name: python-genshi
Arch: s390x
Version : 0.5.1
Release : 7.1.el6
On Apr 29, 2014, at 1:17 PM, Jan Pokorný jpoko...@redhat.com wrote:
Just to be sure could you provide also your python-genshi version?
signature.asc
Description: Message signed with OpenPGP using
Thanks for the suggestions Jan. Your help is appreciated.
Neale
On Apr 29, 2014, at 2:27 PM, Jan Pokorný jpoko...@redhat.com wrote:
Sadly, having no direct access to IE8, cannot track this further on my
own.
signature.asc
Description: Message signed with OpenPGP using GPGMail
--
Hi,
One of the guys created a simple configuration and was attempting to use luci
to administer the cluster. It comes up fine but the links Admin ... Logout at
the top left of the window that usually appears is not appearing. Looking at
the code in the header html I see the following:
span
What I'm really after is confirmation that it's luci which is propagating the
cluster.conf via the ricci service running on the nodes. If that's the case, I
want to determine why luci only opts to send to one node and not the other.
--
Linux-cluster mailing list
Linux-cluster@redhat.com
It pushed out correctly from cnode2.
Log into a node, manually edit cluster.conf to increase
config_version=x, save and exit, run 'ccs_config_validate' and if
there are no errors, run 'cman_tool version -r'. Enter the 'ricci'
password(s) if prompted. Then
Not yet, I'm trying to gather information and verify it's not a configuration
problem on my side.
That does seem to indicate an issue with luci then. Have you opened a
rhbz bug?
--
Linux-cluster mailing list
Linux-cluster@redhat.com
I created a simple two-node cluster (cnode1, cnode2). I also created a linux
instance from where I run luci (cmanager). I was able to create the two nodes
successfully. On both nodes /etc/cluster/cluster.conf is created. However, any
updates to the configuration only are being reflected on one
Couple of questions;
* What OS/cman version?
Linux / cman-3.0.12.1-5
* Are the modclusterd and ricci daemons running?
ricci 1439 1 0 11:18 ?00:00:16 ricci -u ricci
root 1486 1 2 11:18 ?00:02:03 modclusterd
* Did you set the local 'ricci' user password on
Sorry, what *distro* and version? Sounds like RHEL / CentOS 6.something
though.
CentOS 6.4
* Are the modclusterd and ricci daemons running?
ricci 1439 1 0 11:18 ?00:00:16 ricci -u ricci
root 1486 1 2 11:18 ?00:02:03 modclusterd
* Did you set the local
38 matches
Mail list logo