Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
The logs from the recovering node are attached. If you need the same from the other node I will get them tonight. On Sep 2, 2014, at 12:42 PM, David Teigland wrote: > We need to sort out which nodes are sending/receiving plock data to/from > each other. The way it's supposed to work, is an exi

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread David Teigland
On Tue, Sep 02, 2014 at 04:24:07PM +, Neale Ferguson wrote: > In retrieve_plocks_stored() there is the code: > > retrieve_plocks(ls, &sig); > > if ((hd->flags & DLM_MFLG_PLOCK_SIG) && (sig != hd->msgdata2)) { > log_error("lockspace %s plock disabled our sig %x

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
In retrieve_plocks_stored() there is the code: retrieve_plocks(ls, &sig); if ((hd->flags & DLM_MFLG_PLOCK_SIG) && (sig != hd->msgdata2)) { log_error("lockspace %s plock disabled our sig %x " "nodeid %d sig %x", ls->name, sig, hd->nodeid,

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Thanks David, That makes sense as there's this message that precedes the disable message in the log: retrieve_plocks ckpt open error 12 lvclusdidiz0360 Neale On Sep 2, 2014, at 11:37 AM, David Teigland wrote: > On Tue, Sep 02, 2014 at 02:56:52PM +, Neale Ferguson wrote: > >> 1409631951

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread David Teigland
On Tue, Sep 02, 2014 at 02:56:52PM +, Neale Ferguson wrote: > 1409631951 lockspace lvclusdidiz0360 > plock disabled our sig 816fba01 nodeid 2 sig 2f6b There is a difference in plock data signatures between the node that wrote the data and the node that read it (this one). This indicates that

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Forget the snippet of code in my original posting as the code in 3.0.12-60 actually looks like this: if (nodes_added(ls)) { store_plocks(ls, &sig); ls->last_plock_sig = sig; } else { sig = ls->last_plock_sig; } send_p

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Thanks Bob, It's corosync - corosync-1.4.1-17, cman-3.0.12.1-60, fence-agents-3.1.5-26. Neale On Sep 2, 2014, at 11:04 AM, Bob Peterson wrote: > - Original Message - > > Hi Neale, > > For what it's worth: GFS2 just passes plock requests down to the cluster > infrastructure. (Unlike f

Re: [Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Bob Peterson
- Original Message - > Hi, > In our two node system if one node fails, the other node takes over the > application and uses the shared gfs2 target successfully. However, after > the failed node comes back any attempts to lock files on the gfs2 resource > results in -ENOSYS. The followin

[Linux-cluster] F_SETLK fails after recovery

2014-09-02 Thread Neale Ferguson
Hi, In our two node system if one node fails, the other node takes over the application and uses the shared gfs2 target successfully. However, after the failed node comes back any attempts to lock files on the gfs2 resource results in -ENOSYS. The following test program exhibits the problem - i