Public bug reported:
Note:
This is my first Ubuntu bug report and I want to be upfront: I am not certain
whether this is a kernel bug, an nfs-utils bug, or an interaction between the
two. I spent a considerable amount of time debugging this issue and the
evidence points toward a regression in the 7.0.0-14-generic kernel's NFSv4.1
client, but I may be missing something. I am filing this in good faith with as
much detail as I could gather. :)
Affected version (Ubuntu 26.04):
kernel: 7.0.0-14-generic
nfs-common: 2.8.5-1ubuntu1
Unaffected version (Ubuntu 24.04):
kernel: 6.8.0-110-generic
nfs-common: 2.6.4-3ubuntu5.1
Description:
When using NFSv4.2 mounts, the kernel NFS client intermittently returns
EREMOTEIO (-121) from nfs_revalidate_inode despite the NFS server returning
NFS4_OK with valid attributes for every operation.
The issue is reproducible with a single-threaded sequential workload, ruling
out concurrency as a factor. Rolling back to Ubuntu 24.04 (kernel 6.8.x) on the
same hardware eliminates the issue.
Environment:
- Harvester HCI cluster hosting multiple downstream (RKE2, Kubernetes v1.34.2)
clusters based on Ubuntu 26.04 nodes
- Longhorn v1.10.2 (NFS-Ganesha V7.3 share-manager)
- NFSv4.2 mounts with hard,fatal_neterrors=none,proto=tcp,timeo=600,retrans=2
- Two NFS sessions multiplexed over one TCP connection
Symptoms:
Applications using NFS-mounted directories intermittently receive EIO (Remote
I/O error) on directory iteration (ls, stat, etc.). The errors are transient —
retrying the same operation seconds later succeeds.
Kernel debug trace showing the bug:
With rpc_debug and nfs_debug set to 65535, the following pattern is observed at
the moment of failure:
[1827551.822381] NFS: permission(0:682/13107201), mask=0x81, res=-10
[1827551.822394] NFS: revalidating (0:682/13107201)
[1827551.822403] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295
max_slots=64
[1827551.822406] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
[1827551.822422] encode_sequence: sessionid=1:1779709154:1:0 seqid=129303
slotid=0 max_slotid=0 cache_this=0
[1827551.822434] RPC: xs_tcp_send_request(244) = 0
[1827551.823073] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
[1827551.823078] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
[1827551.823082] nfs4_free_slot: slotid 1 highest_used_slotid 0
[1827551.823084] nfs41_sequence_process: Error 0 free the slot
[1827551.823095] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
[1827551.823099] nfs_revalidate_inode: (0:682/13107201) getattr failed,
error=-121
The GETATTR request is sent on slotid=0 (xs_tcp_send_request(244) = 0). A
response is received and slot 0 is freed (Error 0). However, no decode_attr_*
lines appear between the send and the failure — the response was received at
the transport layer but the attributes were never decoded. nfs_revalidate_inode
then returns error=-121 (EREMOTEIO).
In contrast, successful GETATTRs always show a full sequence of
decode_attr_type, decode_attr_fsid, decode_attr_fileid, etc. between send and
completion.
NFS server (Ganesha) logs show zero errors — every operation returns NFS4_OK.
This was verified with Ganesha debug logging at FULL_DEBUG level during
failures.
Two NFS sessions (sessionid=1 and sessionid=2) share one TCP connection. A
lease renewal on session 2 precedes failures on session 1 by ~1 second, but
they do not overlap.
bad_xid=1 over 565,000 RPC calls — one historical XID mismatch.
Release:
Description: Ubuntu 26.04 LTS
Release: 26.04
Package version:
nfs-common:
Installed: 1:2.8.5-1ubuntu1
Candidate: 1:2.8.5-1ubuntu1
Version table:
*** 1:2.8.5-1ubuntu1 500
500 http://archive.ubuntu.com/ubuntu resolute/main amd64 Packages
100 /var/lib/dpkg/status
** Affects: ubuntu
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154224
Title:
NFSv4.1 client generates EREMOTEIO (-121) during inode revalidation
despite server returning NFS4_OK — kernel 7.0.0-14-generic
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2154224/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs