Since some Fedora33 update in the last couple of weeks the problem has
gone away. I haven't changed anything as far as I am aware.
One change is that the kernel moved from 5.13.x to 5.14.x ...
Terry
On 21/10/2021 23:36, Reon Beon via users wrote:
https://release-monitoring.org/project/2081/
We
https://release-monitoring.org/project/2081/
Well it is a pre-release version. 2.5.5.rc3
___
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct:
https://docs.fedo
Hi Roger,
Thanks for looking.
I will try NFS v3 with my latency tests running. I did try NFS v3 before
and I "think" there were still desktop lockups but for a much shorter
time. But this is just a feeling.
Current kernel on both systems is: 5.13.19-100.fc33.x86_64.
If I find the time, I will
That network looks fine to me
I would try v3. I have had bad luck many times with v4 on a variety
of different kernels. If the code is recovering from something
related to a bug 45 seconds might be right to decide something that
was working is no longer working.
I am not sure any amount of debu
sar -n EDEV reports all 0's all around then. There are somerxdrop/s of 0.02 occasionally on eno1 through the day (about 20 of these
with minute based sampling). Today ifconfig lists 39 dropped RX packets
out of 2357593. Not sure why there are some dropped packets. "ethtool -S
eno1" doesn't seem
Since it is recovering from it, maybe it is losing packets inside the
network, what does "sar -n DEV" and "sar -n EDEV" look like during
that time on both client seeing the pause and the server.
EDEV is typically all zeros unless something is lost. if something is
being lost and it matches the ti
and iostats:
04/10/21 10:51:14
avg-cpu: %user %nice %system %iowait %steal %idle
2.09 0.00 1.56 0.02 0.00 96.33
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s
wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s
%drqm d
My disklatencytest showed a longish (14 secs) NFS file system
directoty/stat lookup again today on a desktop:
2021-10-04T05:26:19 0.069486 0.069486 0.000570 /home/...
2021-10-04T05:28:19 0.269743 0.538000 0.001019 /home/...
2021-10-04T09:48:00 1.492158 0.003314
On 04/10/2021 00:51, Roger Heflin wrote:
With 10 minute samples anything that happened gets averaged enough
that even the worst event is almost impossible to see.
Sar will report the same as date ie local time. And a 12:51 event
would be in the 13:00 sample (started at about 12:50 and ended a
With 10 minute samples anything that happened gets averaged enough that
even the worst event is almost impossible to see.
Sar will report the same as date ie local time. And a 12:51 event would be
in the 13:00 sample (started at about 12:50 and ended at 1300).
What I do see is that during that w
45 second event happened at: 2021-10-02T11:51:02 UTC. Not sure what sar
time is based on (maybe local time BST rather than UTC so would be
2021-10-02T12:51:02 BST.
Continuing info ...
sar -n NFSD on the server
11:00:01 24.16 0.00 24.16 0.00 24.16 0.00
0.00
45 second event happened at: 2021-10-02T11:51:02 UTC. Not sure what sar
time is based on (maybe local time BST rather than UTC so would be
2021-10-02T12:51:02 BST.
"sar -d" on the server:
11:50:02 dev8-0 4.67 0.01 46.62 0.00 9.99
0.12 14.03 5.75
11:50:0
You might retest with nfsv3, the code handling v3 should be significantly
different since v3 is stateless and does not maintain long-term connections.
And if the long-term connection had some sort of issue then 45 seconds may
be how long it takes to figure that out and re-initiate the connection.
What did the sar -d look like for the 2 minutes before and 2 minutes
afterward?
If it is slow or not may depend on if the directory/file fell out of cache
and had to be reread from the disk.
I have also seen really large dirs take a really long time to find, but
typically that takes thousands of
I am getting more sure this is an NFS/networking issue rather than an
issue with disks in the server.
I created a small test program that given a directory finds a random
file in a random directory three levels below, opens it and reads up to
a block (512 Bytes) of data from it and times how l
On Fri, 1 Oct 2021 at 16:20, Terry Barnaby wrote:
>
> Thanks for the info, I am using MDraid. There are no "mddevice" messages
> in /var/log/messages and smartctl -a lists no errors on any of the
> disks. The disks are about 3 years old, I change them in servers between
> 3 and 4 years old.
>
Wh
You need to replace mddevice with the name of your mddevice.
probably md0.
3-5 years is about when they start to go. I have 2-3TB wd-reds
sitting on the floor because their correctable/offline uncorr kept
happening and blipping my storage (a few second pause). I even
removed the disks from the
On 01/10/2021 19:05, Roger Heflin wrote:
it will show latency. await is average iotime in ms, and %util is
calced based in await and iops/sec. So long as your turn sar down to
1 minute samples it should tell you which of the 2 disks had higher
await/util%.With a 10 minute sample the 40sec p
it will show latency. await is average iotime in ms, and %util is
calced based in await and iops/sec. So long as your turn sar down to
1 minute samples it should tell you which of the 2 disks had higher
await/util%.With a 10 minute sample the 40sec pause may get spread
out across enough iops
On 01/10/2021 13:31, D. Hugh Redelmeier wrote:
Trivial thoughts from reading this thread. Please don't take the
triviality as an insult.
Perhaps the best way to determine if the problem is from a software update
is to downgrade likely packages. In the case of the kernel, you can just
boot an o
On 30/09/2021 19:27, Roger Heflin wrote:
Raid0, so there is no redundancy on the data?
And what kind of underlying hard disks? The desktop drives will try
for a long time (ie a minute or more) to read any bad blocks. Those
disks will not report an error unless it gets to the default os
timeou
Trivial thoughts from reading this thread. Please don't take the
triviality as an insult.
Perhaps the best way to determine if the problem is from a software update
is to downgrade likely packages. In the case of the kernel, you can just
boot an older one (assuming that an old enough one is s
Raid0, so there is no redundancy on the data?
And what kind of underlying hard disks? The desktop drives will try
for a long time (ie a minute or more) to read any bad blocks. Those
disks will not report an error unless it gets to the default os
timeout, or it hits the disk firmware timeout.
T
On Thu, 30 Sep 2021 17:50:01 +0100
Terry Barnaby wrote:
> Yes, problems often occur due to you having done something, but I am
> pretty sure nothing has changed apart from Fedora updates.
But hardware is sneaky. It waits for you to install software updates,
the breaks itself to make you think th
On 30/09/2021 11:42, Roger Heflin wrote:
On mine when I first access the NFS volume it takes 5-10 seconds for
the disks to spin up. Mine will spin down later in the day if little
or nothing is going on and I will get another delay.
I have also seen delays if a disk gets bad blocks and correct
On 30/09/2021 11:32, Ed Greshko wrote:
On 30/09/2021 16:35, Terry Barnaby wrote:
This is a very lightly loaded system with just 3 users ATM and very
little going on across the network (just editing code files etc). The
problem occurred again yesterday. For about 10 minutes my KDE desktop
locke
On mine when I first access the NFS volume it takes 5-10 seconds for the
disks to spin up. Mine will spin down later in the day if little or
nothing is going on and I will get another delay.
I have also seen delays if a disk gets bad blocks and corrects them. About
1/2 of time that does have a m
On 30/09/2021 16:35, Terry Barnaby wrote:
This is a very lightly loaded system with just 3 users ATM and very little going on
across the network (just editing code files etc). The problem occurred again yesterday.
For about 10 minutes my KDE desktop locked up in 20 second bursts and then the pr
Thanks for the feedback everyone.
This is a very lightly loaded system with just 3 users ATM and very
little going on across the network (just editing code files etc). The
problem occurred again yesterday. For about 10 minutes my KDE desktop
locked up in 20 second bursts and then the problem w
Make sure you have sar/sysstat enabled and changed to do 1 minute samples.
sar -d will show disk perf. If one of the disks "blips" at the
firmware level (working on a hard to read block maybe), the util% on
that device will be significantly higher than all other disks so will
stand out. Then you
Are there network switches under your control? It sounds similar to what
happens when MTU on the systems MTU do not match or one system MTU is set
above the value on the switch ports.
Next time the issue occurs use ping with the do not fragment flag.
ex $ ping -m DO -s 8972 ip.address
This exampl
On Sun, 26 Sep 2021 10:26:19 -0300
George N. White III wrote:
> If you have cron jobs that use a lot of network bandwidth it may work
> fine until some network issue causing lots of retransmits bogs it down.
Which is why you should check the dumb stuff first! Has a critter
chewed on the ethernet
On Sun, 26 Sept 2021 at 01:44, Tim via users
wrote:
> On Sat, 2021-09-25 at 06:04 +0100, Terry Barnaby wrote:
> > in the last month or so all of the client computers are getting KDE
> > GUI lockups every few hours that last for around 40 secs.
>
> Might one of them have a cron job that's scouring
On Sat, 2021-09-25 at 06:04 +0100, Terry Barnaby wrote:
> in the last month or so all of the client computers are getting KDE
> GUI lockups every few hours that last for around 40 secs.
Might one of them have a cron job that's scouring the network?
e.g. locate databasing
--
uname -rsvp
Linux
On Sat, 25 Sept 2021 at 02:04, Terry Barnaby wrote:
> Hi,
>
> I use NFS mount (defaults so V4) /home directories with a simple server
> over Gigabit Ethernet all running Fedora33. This has been working fine
> for 25+ years through various Fedora versions. However in the last month
> or so all of
On 25/09/2021 09:00, Ed Greshko wrote:
On 25/09/2021 14:07, Terry Barnaby wrote:
A few questions.
1. Are you saying your NFS server HW is the same for the past 25
years. Couldn't have been all Fedora, right?
No ( :) ) was using previous Linux and Unix systems before then.
Certainly OS v
On 25/09/2021 14:07, Terry Barnaby wrote:
A few questions.
1. Are you saying your NFS server HW is the same for the past 25 years.
Couldn't have been all Fedora, right?
No ( :) ) was using previous Linux and Unix systems before then. Certainly OS
versions and hardware has changed over th
On 25/09/2021 06:42, Ed Greshko wrote:
On 25/09/2021 13:04, Terry Barnaby wrote:
Hi,
I use NFS mount (defaults so V4) /home directories with a simple
server over Gigabit Ethernet all running Fedora33. This has been
working fine for 25+ years through various Fedora versions. However
in the la
On 25/09/2021 13:04, Terry Barnaby wrote:
Hi,
I use NFS mount (defaults so V4) /home directories with a simple server over Gigabit
Ethernet all running Fedora33. This has been working fine for 25+ years through various
Fedora versions. However in the last month or so all of the client computer
Hi,
I use NFS mount (defaults so V4) /home directories with a simple server
over Gigabit Ethernet all running Fedora33. This has been working fine
for 25+ years through various Fedora versions. However in the last month
or so all of the client computers are getting KDE GUI lockups every few
h
40 matches
Mail list logo