Re: NFS mount lockups since about a month ago

Roger Heflin Thu, 30 Sep 2021 11:28:06 -0700

Raid0, so there is no redundancy on the data?

And what kind of underlying hard disks?   The desktop drives will try
for a long time (ie a minute or more) to read any bad blocks.  Those
disks will not report an error unless it gets to the default os
timeout, or it hits the disk firmware timeout.


The sar data will show if one of the disks is being slow on the server end.

On the client end you are unlikely to get anything useful from any
samples as it seems pretty likely the server is not responding to nfs
and/or the disks are not responding.

It could be as simple as on login it tries to read a badish/slow block
and that block takes a while to finally get it to read.   If that is
happening it will probably eventually stop being able to read it, and
if you really are using raid0 then some data will be lost.

All of the nfsv4 issues I have ran into involve it just breaking and
staying broke (usually when the server reboots).  I never had it have
big sudden pauses, but using v3 won't hurt and I try to avoid v4
still.

On Thu, Sep 30, 2021 at 11:55 AM Terry Barnaby <ter...@beam.ltd.uk> wrote:
>
> On 30/09/2021 11:42, Roger Heflin wrote:
>
> On mine when I first access the NFS volume it takes 5-10 seconds for the 
> disks to spin up.  Mine will spin down later in the day if little or nothing 
> is going on and I will get another delay.
>
> I have also seen delays if a disk gets bad blocks and corrects them.  About 
> 1/2 of time that does have a message but some of the time there are no 
> messages at all about it, and I have had to resort to using Sar to figure out 
> which disk is causing the issue.
>
> So on my machine I see this (sar -d):
> 05:29:01 AM DEV tps rkB/s wkB/s dkB/s areq-sz aqu-sz await %util
> 05:29:01 AM dev8-0 36.16 94.01 683.65 0.00 21.51 0.03 0.67 1.11
> 05:29:01 AM dev8-16 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 05:29:01 AM dev8-32 0.02 0.00 0.00 0.00 0.00 0.00 1.00 0.00
> 05:29:01 AM dev8-48 423.65 71239.92 198.64 0.00 168.63 12.73 29.72 86.07
> 05:29:01 AM dev8-64 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 05:29:01 AM dev8-80 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 05:29:01 AM dev8-144 2071.22 71311.58 212.22 0.00 34.53 11.37 5.47 54.81
> 05:29:01 AM dev8-96 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 05:29:01 AM dev8-128 1630.99 71389.49 198.18 0.00 43.89 15.72 9.62 57.05
> 05:29:01 AM dev8-112 2081.05 71426.01 182.48 0.00 34.41 11.32 5.42 55.68
>
> There is a 4 disk raid6 check going on.
>
> You will notice that dev8-48 is busier than the other 3 disks, in this case 
> that is because it is a 3TB disk vs the other 3 being all newer 6tb disks 
> with higher data/revolution.
>
> If you have sar setup with 60 second samples the one disk that pauses should 
> stand out more obvious than this since the 3tb seems to be only marginally 
> faster than the 6tbs.
>
>
> _______________________________________________
>
> In my case the servers /home is on a partition of the two main Raid0 disks 
> that is shared with the OS and so are active most of the time. No errors 
> reported.
>
> I will try setting up sar with a 60 second sample time on the client, thanks 
> for the idea.
>
>
> _______________________________________________
> users mailing list -- users@lists.fedoraproject.org
> To unsubscribe send an email to users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
> Do not reply to spam on the list, report it: 
> https://pagure.io/fedora-infrastructure
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Re: NFS mount lockups since about a month ago

Reply via email to