Yes indeed there was a problem w/ the underlying NFS connection, logged at
the OS level.  It's funny, this service wasn't even under load when the
timeout happened, so NFS is living up to my expectations already.

So I could either lower the client side timeouts to fit within the 30
second lease, or I could raise the server side lease time to match the 180
seconds the client will try for.


On Fri, Mar 18, 2016 at 8:26 AM, James A. Robinson <j...@highwire.org>
wrote:

> Yes, the combination of settings in place right now could add up to 3
> minutes:
>
> On the client side:
>
> nfsvers=4,proto=tcp,
> hard,timeo=600,retrans=2,ac,acregmin=3,acregmax=60,acdirmin=30,fg,retry=120,
> sharecache,lookupcache=all,cto
>
> So right now it's got a 60 second timeo value, and it will retry up to 2
> times.  I'll see if I can find any OS level messages about the NFS server
> lock, or if the NFS server reported anything.
>
> On the server side:
>
> read delegation, 30 second lease for locks, grace period 45 seconds
>
>
> On Fri, Mar 18, 2016 at 6:30 AM, Tim Bain <tb...@alumni.duke.edu> wrote:
>
>> I'd say it's more likely that either 1) NFS gave away the lock when it
>> shouldn't have, or 2) network conditions were such that your master lost
>> connectivity and NFS rightly allowed the slave to take it.  In either
>> case,
>> useful logging could only come from your NFS server.
>>
>> Separately from the question of why this happened, I'm concerned that it
>> took 3 minutes for the master to recognize it had lost the lock (during
>> which time you'd have had a dual-master situation).  Can that be explained
>> by your specific NFS settings?
>>
>> Tim
>
>
>

Reply via email to