Re: [Linux-cluster] Permission denied

2014-10-14 Thread Neale Ferguson
0 failed 0 seq 1,1 members 1 2 Now to determine why ls_slot is 0. Neale On 10/13/14, 5:10 PM, Neale Ferguson ne...@sinenomine.net wrote: I put some debug code into the gfs2 module and I see it failing the mount at this point: /* * If user space has failed to join the cluster

Re: [Linux-cluster] Permission denied

2014-10-14 Thread Neale Ferguson
Yeah, I noted I was looking at the wrong lockspace. The gfs2 lockspace in this cluster is vol1. Once I corrected at what I was looking at, I think I solved my problem: I believe the problem is an endian thing. In set_rcom_status: rs-rs_flags = cpu_to_le32(flags) However, in

[Linux-cluster] Permission denied

2014-10-13 Thread Neale Ferguson
I reported last week that I was getting permission denied when pcs was starting a gfs2 resource. I thought it was due to the resource being defined incorrectly, but it doesn¹t appear to be the case. On rare occasions the mount works but most of the time one node gets it mounted but the other gets

Re: [Linux-cluster] Permission denied

2014-10-13 Thread Neale Ferguson
Software levels: pacemaker-1.1.10-29 pcs-0.9.115-32 dlm-4.0.2-4 corosync-2.3.3-2 lvm2-cluster-2.02.105-14 On 10/13/14, 11:20 AM, Neale Ferguson ne...@sinenomine.net wrote: I reported last week that I was getting permission denied when pcs was starting a gfs2 resource. I thought it was due

Re: [Linux-cluster] Permission denied

2014-10-13 Thread Neale Ferguson
Yep: # pcs stonith show ZVMPOWER Resource: ZVMPOWER (class=stonith type=fence_zvm) Attributes: ipaddr=VSMREQIU pcmk_host_map=rh7cn1.devlab.sinenomine.net:RH7CN1;rh7cn2.devlab.sinenomine. net:RH7CN2 pcmk_host_list=rh7cn1.devlab.sinenomine.net;rh7cn2.devlab.sinenomine.net

Re: [Linux-cluster] Permission denied

2014-10-13 Thread Neale Ferguson
Thanks Bob, answers inline... On 10/13/14, 12:16 PM, Bob Peterson rpete...@redhat.com wrote: - Original Message - I would appreciate any debugging suggestions. I¹ve straced dlm_controld/corosync but not gained much clarity. Neale Hi Neale, 1. What does it say if you try to mount

Re: [Linux-cluster] Permission denied

2014-10-13 Thread Neale Ferguson
On 10/13/14, 1:58 PM, Bob Peterson rpete...@redhat.com wrote: - Original Message - (snip) 3. What kernel is this? Make sure both nodes are running the same kernel, at any rate. Both running 3.10.0-123.8.1 It was made via: mkfs.gfs2 -j 2 -J 16 -r 32 -t rh7cluster:vol1

Re: [Linux-cluster] Permission denied

2014-10-13 Thread Neale Ferguson
On 10/13/14, 1:58 PM, Bob Peterson rpete...@redhat.com wrote: - Original Message - Can you try with the standard 128MB journal size just as an experiment to see if it mounts more consistently or if you get the same error? Maybe GFS2's recovery code is sending an error back for some reason

Re: [Linux-cluster] Permission denied

2014-10-13 Thread Neale Ferguson
I put some debug code into the gfs2 module and I see it failing the mount at this point: /* * If user space has failed to join the cluster or some similar * failure has occurred, then the journal id will contain a * negative (error) number. This will then be returned to

Re: [Linux-cluster] gfs2 resource not mounting

2014-10-05 Thread Neale Ferguson
an existing configuration? Neale On Oct 3, 2014, at 3:32 PM, Neale Ferguson ne...@sinenomine.net wrote: Using the same two-node configuration I described in an earlier post this forum, I'm having problems getting a gfs2 resource started on one of the nodes. The resource in question: Resource

Re: [Linux-cluster] Fencing of node

2014-10-03 Thread Neale Ferguson
That was the problem! I applied a local patch, rebuilt, restarted, and we're up fine and dandy! Thanks very much... Neale On Oct 3, 2014, at 3:34 AM, Christine Caulfield ccaul...@redhat.com wrote: I think you're hitting this bug:

[Linux-cluster] gfs2 resource not mounting

2014-10-03 Thread Neale Ferguson
Using the same two-node configuration I described in an earlier post this forum, I'm having problems getting a gfs2 resource started on one of the nodes. The resource in question: Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/vg_cluster/ha_lv

[Linux-cluster] Fencing of node

2014-10-02 Thread Neale Ferguson
After creating simple two node cluster, one node is being fenced continually. I'm running pacemaker (1.1.10-29) with two nodes and the following corosync.conf: totem { version: 2 secauth: off cluster_name: rh7cluster transport: udpu } nodelist { node { ring0_addr:

Re: [Linux-cluster] Fencing of node

2014-10-02 Thread Neale Ferguson
Forgot to include cib.xml: cib epoch=17 num_updates=0 admin_epoch=0 validate-with=pacemaker-1.2 cib-last-written=Thu Oct 2 15:13:47 2014 update-origin=rh7cn1.devlab.sinenomine.net update-client=cibadmin crm_feature_set=3.0.7 have-quorum=1 configuration crm_config

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-08 Thread Neale Ferguson
Further to the problem described last week. What I'm seeing is that the node (NODE2) that keeps going when NODE1 fails has many entries in dlm_tool log_plocks output: 1410147734 lvclusdidiz0360 receive plock 10303 LK WR 0-7fff 1/8112/13d5000 w 0 1410147734 lvclusdidiz0360 receive

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-08 Thread Neale Ferguson
On Sep 8, 2014, at 11:17 AM, David Teigland teigl...@redhat.com wrote: On Mon, Sep 08, 2014 at 02:44:49PM +, Neale Ferguson wrote: 1410147820 lvclusdidiz0360 store_plocks first 66307 last 88478 r_count 45 p_count 63 sig 5ab0 1410147820 lvclusdidiz0360 receive_plocks_stored 2:8 flags

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-08 Thread Neale Ferguson
wrote: On Mon, Sep 08, 2014 at 03:35:05PM +, Neale Ferguson wrote: The checkpoint data is sent to corosync/openais, which is responsible for syncing that data to the other nodes, which should then be able to open and read it. You'll also want to look for corosync/openais errors related

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-03 Thread Neale Ferguson
Will do. Having trouble accessing that system at the moment. I hope to get it later today. Neale On Sep 3, 2014, at 12:32 PM, Tan Sri Dato' Eur Ing Adli white.he...@yahoo.com wrote: Well, gimme the other node log too. Sent from my iPhone On Sep 3, 2014, at 12:48 AM, Neale Ferguson

[Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Hi, In our two node system if one node fails, the other node takes over the application and uses the shared gfs2 target successfully. However, after the failed node comes back any attempts to lock files on the gfs2 resource results in -ENOSYS. The following test program exhibits the problem -

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Thanks Bob, It's corosync - corosync-1.4.1-17, cman-3.0.12.1-60, fence-agents-3.1.5-26. Neale On Sep 2, 2014, at 11:04 AM, Bob Peterson rpete...@redhat.com wrote: - Original Message - Hi Neale, For what it's worth: GFS2 just passes plock requests down to the cluster

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Forget the snippet of code in my original posting as the code in 3.0.12-60 actually looks like this: if (nodes_added(ls)) { store_plocks(ls, sig); ls-last_plock_sig = sig; } else { sig = ls-last_plock_sig; }

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Thanks David, That makes sense as there's this message that precedes the disable message in the log: retrieve_plocks ckpt open error 12 lvclusdidiz0360 Neale On Sep 2, 2014, at 11:37 AM, David Teigland teigl...@redhat.com wrote: On Tue, Sep 02, 2014 at 02:56:52PM +, Neale Ferguson wrote

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
, 2014, at 12:02 PM, Neale Ferguson ne...@sinenomine.net wrote: Thanks David, That makes sense as there's this message that precedes the disable message in the log: retrieve_plocks ckpt open error 12 lvclusdidiz0360 Neale On Sep 2, 2014, at 11:37 AM, David Teigland teigl...@redhat.com

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
The logs from the recovering node are attached. If you need the same from the other node I will get them tonight. On Sep 2, 2014, at 12:42 PM, David Teigland teigl...@redhat.com wrote: We need to sort out which nodes are sending/receiving plock data to/from each other. The way it's supposed

[Linux-cluster] Delaying fencing during shutdown

2014-08-28 Thread Neale Ferguson
Hi, In a two node cluster I shutdown one of the nodes and the other node notices the shutdown but on rare occasions that node will then fence the node that is shutting down. I assume Is this a situation where setting post_fail_delay would be useful or setting the totem timeout to something

[Linux-cluster] clvmd not terminating

2014-08-19 Thread Neale Ferguson
We have a sporadic situation where we are attempting to shutdown/restart both nodes of a two node cluster. One shutdowns completely but one sometimes hangs with: [root@aude2mq036nabzi ~]# service cman stop Stopping cluster: Leaving fence domain... found dlm lockspace /sys/kernel/dlm/clvmd

Re: [Linux-cluster] luci question

2014-04-29 Thread Neale Ferguson
Thanks Vinh. He is using IE8 (company policy!!). I've tried it with IE8, IE10, Chrome, and Safari and all worked fine. He has cookies enabled so I'm at a loss as to how that auth_stack_enabled setting is set/updated/cleared. Neale On Apr 25, 2014, at 5:37 PM, Cao, Vinh vinh@hp.com wrote:

Re: [Linux-cluster] luci question

2014-04-29 Thread Neale Ferguson
luci-0.26.0-48 (tried -13 as well) TurboGears2-2.0.3-4. kernel-2.6.32-358.2.1 python-repoze-who-1.0.18-1 (I believe - am verifying) On Apr 29, 2014, at 11:18 AM, Jan Pokorný jpoko...@redhat.com wrote: could you be more specific as to which versions of luci, TurboGears and repoze.who? In

Re: [Linux-cluster] luci question

2014-04-29 Thread Neale Ferguson
to date but I'll ask. Neale On Apr 29, 2014, at 1:17 PM, Jan Pokorný jpoko...@redhat.com wrote: On 29/04/14 16:02 +, Neale Ferguson wrote: luci-0.26.0-48 (tried -13 as well) TurboGears2-2.0.3-4. kernel-2.6.32-358.2.1 python-repoze-who-1.0.18-1 (I believe - am verifying) Thanks

Re: [Linux-cluster] luci question

2014-04-29 Thread Neale Ferguson
Name: python-genshi Arch: s390x Version : 0.5.1 Release : 7.1.el6 On Apr 29, 2014, at 1:17 PM, Jan Pokorný jpoko...@redhat.com wrote: Just to be sure could you provide also your python-genshi version? signature.asc Description: Message signed with OpenPGP using

Re: [Linux-cluster] luci question

2014-04-29 Thread Neale Ferguson
Thanks for the suggestions Jan. Your help is appreciated. Neale On Apr 29, 2014, at 2:27 PM, Jan Pokorný jpoko...@redhat.com wrote: Sadly, having no direct access to IE8, cannot track this further on my own. signature.asc Description: Message signed with OpenPGP using GPGMail --

[Linux-cluster] luci question

2014-04-25 Thread Neale Ferguson
Hi, One of the guys created a simple configuration and was attempting to use luci to administer the cluster. It comes up fine but the links Admin ... Logout at the top left of the window that usually appears is not appearing. Looking at the code in the header html I see the following: span

Re: [Linux-cluster] cluster.conf not being propagated to all nodes

2013-12-24 Thread Neale Ferguson
What I'm really after is confirmation that it's luci which is propagating the cluster.conf via the ricci service running on the nodes. If that's the case, I want to determine why luci only opts to send to one node and not the other. -- Linux-cluster mailing list Linux-cluster@redhat.com

Re: [Linux-cluster] cluster.conf not being propagated to all nodes

2013-12-24 Thread Neale Ferguson
It pushed out correctly from cnode2. Log into a node, manually edit cluster.conf to increase config_version=x, save and exit, run 'ccs_config_validate' and if there are no errors, run 'cman_tool version -r'. Enter the 'ricci' password(s) if prompted. Then

Re: [Linux-cluster] cluster.conf not being propagated to all nodes

2013-12-24 Thread Neale Ferguson
Not yet, I'm trying to gather information and verify it's not a configuration problem on my side. That does seem to indicate an issue with luci then. Have you opened a rhbz bug? -- Linux-cluster mailing list Linux-cluster@redhat.com

[Linux-cluster] cluster.conf not being propagated to all nodes

2013-12-23 Thread Neale Ferguson
I created a simple two-node cluster (cnode1, cnode2). I also created a linux instance from where I run luci (cmanager). I was able to create the two nodes successfully. On both nodes /etc/cluster/cluster.conf is created. However, any updates to the configuration only are being reflected on one

Re: [Linux-cluster] cluster.conf not being propagated to all nodes

2013-12-23 Thread Neale Ferguson
Couple of questions; * What OS/cman version? Linux / cman-3.0.12.1-5 * Are the modclusterd and ricci daemons running? ricci 1439 1 0 11:18 ?00:00:16 ricci -u ricci root 1486 1 2 11:18 ?00:02:03 modclusterd * Did you set the local 'ricci' user password on

Re: [Linux-cluster] cluster.conf not being propagated to all nodes

2013-12-23 Thread Neale Ferguson
Sorry, what *distro* and version? Sounds like RHEL / CentOS 6.something though. CentOS 6.4 * Are the modclusterd and ricci daemons running? ricci 1439 1 0 11:18 ?00:00:16 ricci -u ricci root 1486 1 2 11:18 ?00:02:03 modclusterd * Did you set the local