Public bug reported:

Note:
This is my first Ubuntu bug report and I want to be upfront: I am not certain 
whether this is a kernel bug, an nfs-utils bug, or an interaction between the 
two. I spent a considerable amount of time debugging this issue and the 
evidence points toward a regression in the 7.0.0-14-generic kernel's NFSv4.1 
client, but I may be missing something. I am filing this in good faith with as 
much detail as I could gather. :)


Affected version (Ubuntu 26.04):
kernel: 7.0.0-14-generic
nfs-common: 2.8.5-1ubuntu1


Unaffected version (Ubuntu 24.04):
kernel: 6.8.0-110-generic
nfs-common: 2.6.4-3ubuntu5.1


Description:
When using NFSv4.2 mounts, the kernel NFS client intermittently returns 
EREMOTEIO (-121) from nfs_revalidate_inode despite the NFS server returning 
NFS4_OK with valid attributes for every operation.
The issue is reproducible with a single-threaded sequential workload, ruling 
out concurrency as a factor. Rolling back to Ubuntu 24.04 (kernel 6.8.x) on the 
same hardware eliminates the issue.


Environment:
- Harvester HCI cluster hosting multiple downstream (RKE2, Kubernetes v1.34.2) 
clusters based on Ubuntu 26.04 nodes
- Longhorn v1.10.2 (NFS-Ganesha V7.3 share-manager)
- NFSv4.2 mounts with hard,fatal_neterrors=none,proto=tcp,timeo=600,retrans=2
- Two NFS sessions multiplexed over one TCP connection


Symptoms:
Applications using NFS-mounted directories intermittently receive EIO (Remote 
I/O error) on directory iteration (ls, stat, etc.). The errors are transient — 
retrying the same operation seconds later succeeds.


Kernel debug trace showing the bug:
With rpc_debug and nfs_debug set to 65535, the following pattern is observed at 
the moment of failure:
[1827551.822381] NFS: permission(0:682/13107201), mask=0x81, res=-10
[1827551.822394] NFS: revalidating (0:682/13107201)
[1827551.822403] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 
max_slots=64
[1827551.822406] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
[1827551.822422] encode_sequence: sessionid=1:1779709154:1:0 seqid=129303 
slotid=0 max_slotid=0 cache_this=0
[1827551.822434] RPC:       xs_tcp_send_request(244) = 0
[1827551.823073] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
[1827551.823078] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
[1827551.823082] nfs4_free_slot: slotid 1 highest_used_slotid 0
[1827551.823084] nfs41_sequence_process: Error 0 free the slot
[1827551.823095] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
[1827551.823099] nfs_revalidate_inode: (0:682/13107201) getattr failed, 
error=-121


The GETATTR request is sent on slotid=0 (xs_tcp_send_request(244) = 0). A 
response is received and slot 0 is freed (Error 0). However, no decode_attr_* 
lines appear between the send and the failure — the response was received at 
the transport layer but the attributes were never decoded. nfs_revalidate_inode 
then returns error=-121 (EREMOTEIO).
In contrast, successful GETATTRs always show a full sequence of 
decode_attr_type, decode_attr_fsid, decode_attr_fileid, etc. between send and 
completion.
NFS server (Ganesha) logs show zero errors — every operation returns NFS4_OK. 
This was verified with Ganesha debug logging at FULL_DEBUG level during 
failures.


Two NFS sessions (sessionid=1 and sessionid=2) share one TCP connection. A 
lease renewal on session 2 precedes failures on session 1 by ~1 second, but 
they do not overlap.
bad_xid=1 over 565,000 RPC calls — one historical XID mismatch.


Release:
Description:    Ubuntu 26.04 LTS
Release:        26.04


Package version:
nfs-common:
  Installed: 1:2.8.5-1ubuntu1
  Candidate: 1:2.8.5-1ubuntu1
  Version table:
 *** 1:2.8.5-1ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu resolute/main amd64 Packages
        100 /var/lib/dpkg/status

** Affects: ubuntu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2154224

Title:
  NFSv4.1 client generates EREMOTEIO (-121) during inode revalidation
  despite server returning NFS4_OK — kernel 7.0.0-14-generic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2154224/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to