Addition: I was now also able to reproduce a crash with NFS4.1 after 47
minutes of stress testing.

Server:
Oct 02 15:41:52 nfs-server.domain.de kernel: watchdog: BUG: soft lockup - 
CPU#10 stuck for 26s! [kworker/u130:0:33198]
Oct 02 15:41:52 nfs-server.domain.de kernel: Modules linked in: bonding tls 
binfmt_misc intel_rapl_msr intel_rapl_common sb_edac xfs x86_pkg_temp_thermal 
intel_powerclamp coretemp ipmi_ssif kvm_intel kvm ioa>
Oct 02 15:41:52 nfs-server.domain.de kernel: CPU: 10 PID: 33198 Comm: 
kworker/u130:0 Tainted: G          I       5.15.0-122-generic #132-Ubuntu
Oct 02 15:41:52 nfs-server.domain.de kernel: Hardware name: HP ProLiant DL380p 
Gen8, BIOS P70 01/22/2018
Oct 02 15:41:52 nfs-server.domain.de kernel: Workqueue: rpciod 
rpc_async_schedule [sunrpc]
Oct 02 15:41:52 nfs-server.domain.de kernel: RIP: 
0010:mod_delayed_work_on+0x92/0xa0
Oct 02 15:41:52 nfs-server.domain.de kernel: Code: c0 48 8b 55 d0 65 48 2b 14 
25 28 00 00 00 75 1c 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 
fb 66 0f 1f 44 00 00 <eb> d0 e8 27 26 ce 00 0f 1f 8>
Oct 02 15:41:52 nfs-server.domain.de kernel: RSP: 0018:ffffbe8d8a367cf8 EFLAGS: 
00000202
Oct 02 15:41:52 nfs-server.domain.de kernel: RAX: 0000000000000000 RBX: 
0000000000000000 RCX: 0000000010e0000a
Oct 02 15:41:52 nfs-server.domain.de kernel: RDX: 0000000010c00000 RSI: 
0000000000000086 RDI: ffff9d3b1f6a07c0
Oct 02 15:41:52 nfs-server.domain.de kernel: RBP: ffffbe8d8a367d30 R08: 
ffff9d3b1f6a07c0 R09: ffffffffc0ab25c8
Oct 02 15:41:52 nfs-server.domain.de kernel: R10: 0000000000000003 R11: 
ffff9d334d34bb58 R12: ffffffffc0ab25a8
Oct 02 15:41:52 nfs-server.domain.de kernel: R13: ffff9d334e43be00 R14: 
00000000000001f4 R15: 0000000000002000
Oct 02 15:41:52 nfs-server.domain.de kernel: FS:  0000000000000000(0000) 
GS:ffff9d3b1f680000(0000) knlGS:0000000000000000
Oct 02 15:41:52 nfs-server.domain.de kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Oct 02 15:41:52 nfs-server.domain.de kernel: CR2: 000055e2a838b020 CR3: 
0000000284e10004 CR4: 00000000000606e0
Oct 02 15:41:52 nfs-server.domain.de kernel: Call Trace:
Oct 02 15:41:52 nfs-server.domain.de kernel:  <IRQ>
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? show_trace_log_lvl+0x1d6/0x2ea
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? show_trace_log_lvl+0x1d6/0x2ea
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? 
__rpc_sleep_on_priority_timeout+0xff/0x110 [sunrpc]
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? show_regs.part.0+0x23/0x29
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? show_regs.cold+0x8/0xd
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? watchdog_timer_fn+0x1be/0x220
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? 
lockup_detector_update_enable+0x60/0x60
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? __hrtimer_run_queues+0x107/0x230
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? 
clockevents_program_event+0xad/0x130
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? hrtimer_interrupt+0x101/0x220
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? 
__sysvec_apic_timer_interrupt+0x61/0xe0
Oct 02 15:41:52 nfs-server.domain.de kernel:  ? 
sysvec_apic_timer_interrupt+0x7b/0x90
Oct 02 15:41:52 nfs-server.domain.de kernel:  </IRQ>
Oct 02 15:41:52 nfs-server.domain.de kernel:  <TASK>

Client:
Oct  2 15:40:50 nfs-client.domain.de kernel: [ 3406.129264] RPC: Could not send 
backchannel reply error: -110
Oct  2 15:40:59 nfs-client.domain.de kernel: [ 3415.341287] RPC: Could not send 
backchannel reply error: -110
Oct  2 15:40:59 nfs-client.domain.de kernel: [ 3415.699376] RPC: Could not send 
backchannel reply error: -110
Oct  2 15:41:00 nfs-client.domain.de kernel: [ 3416.526110] RPC: Could not send 
backchannel reply error: -110
Oct  2 15:41:16 nfs-client.domain.de kernel: [ 3432.085463] RPC: Could not send 
backchannel reply error: -110
Oct  2 15:41:22 nfs-client.domain.de kernel: [ 3437.866386] RPC: Could not send 
backchannel reply error: -110
Oct  2 15:42:01 nfs-client.domain.de sssd_nss[4188]: Shutting down (status = 0)
Oct  2 15:42:01 nfs-client.domain.de systemd[1]: sssd-nss.service: Deactivated 
successfully.
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131370] INFO: task 
python3:3299 blocked for more than 122 seconds.
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131393]       Not tainted 
6.8.0-40-generic #40~22.04.3-Ubuntu
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131401] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131405] task:python3        
 state:D stack:0     pid:3299  tgid:3299  ppid:3298   flags:0x00000002
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131420] Call Trace:
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131426]  <TASK>
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131434]  
__schedule+0x27c/0x6a0
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131452]  schedule+0x33/0x110
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131461]  
schedule_preempt_disabled+0x15/0x30
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131471]  
rwsem_down_write_slowpath+0x2a2/0x550
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131486]  
down_write+0x5c/0x80
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131498]  
nfs_start_io_write+0x19/0x60 [nfs]
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131610]  
nfs_file_write+0xb5/0x2a0 [nfs]
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131697]  
vfs_write+0x2a5/0x480
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131710]  
ksys_write+0x73/0x100
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131717]  
__x64_sys_write+0x19/0x30
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131725]  
x64_sys_call+0x23e1/0x24b0
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131733]  
do_syscall_64+0x81/0x170
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131744]  ? 
__update_load_avg_cfs_rq+0x34b/0x3c0
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131754]  ? 
update_curr+0x2b/0x210
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131763]  ? 
reweight_entity+0x160/0x270
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131773]  ? 
update_load_avg+0x82/0x850
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131782]  ? 
nohz_balancer_kick+0x10f/0x3c0
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131790]  ? 
trigger_load_balance+0x4d/0x70
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131798]  ? 
scheduler_tick+0x14e/0x370
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131827]  ? 
update_process_times+0x8e/0xb0
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131837]  ? 
hv_set_non_nested_register+0x37/0xa0
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131846]  ? 
hv_set_register+0x52/0x70
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131854]  ? 
hv_ce_set_next_event+0x27/0x40
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131865]  ? 
clockevents_program_event+0xb3/0x140
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131874]  ? 
tick_program_event+0x43/0xa0
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131884]  ? 
hrtimer_interrupt+0x11f/0x250
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131894]  ? 
irqentry_exit_to_user_mode+0x7e/0x260
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131906]  ? 
clear_bhb_loop+0x15/0x70
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131914]  ? 
clear_bhb_loop+0x15/0x70
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131919]  ? 
clear_bhb_loop+0x15/0x70
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131926]  
entry_SYSCALL_64_after_hwframe+0x78/0x80
Oct  2 15:43:29 nfs-client.domain.de kernel: [ 3565.131939] RIP: 
0033:0x7ebcb5114887


** Description changed:

  Since August 19th we have been struggling with irregular crashes on our
  NFS server.
  
  Our experiences with NFS server crashes are:
- - We were able to reproduce the crashes in our production and test 
environments with NFS4.2. Rarely after minutes, sometimes after hours or days, 
but sometimes not at all, 
-   as we often stopped the experiments after 12 to 24 hours.
+ - We were able to reproduce the crashes in our production and test 
environments with NFS4.2. Rarely after minutes, sometimes after hours or days, 
but sometimes not at all,
+   as we often stopped the experiments after 12 to 24 hours.
  - We have not yet been able to reproduce a crash between a bare metal NFS 
server and a bare metal NFS client, but between a bare metal NFS server and a 
virtualized client with NFS4.2.
  - we could not reproduce a crash with NFS vers=4.0 up to now
  - we now running NFS vers=4.1 since some hours to see if this helps to get 
the system stable
  - the crashs happens with or without GSSPROXY
  - before Sept 15 the kernel back on the NFS server trace allways started with:
-       watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [kworker/u483:0:8805]
-       and after Sept 15 with:
-       rcu: INFO: rcu_sched self-detected stall on CPU
- - changing the kernel on the client from 6.8.0-40-generic to the unofficial 
6.5.0-46-generic from Mehmet Basaran (mehmetbasaran) only removed the 
backchannel error from the client, 
-   but the server still hangs with "rcu: INFO: rcu_sched self-detected stall 
on CPU"
+  watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [kworker/u483:0:8805]
+  and after Sept 15 with:
+  rcu: INFO: rcu_sched self-detected stall on CPU
+ - changing the kernel on the client from 6.8.0-40-generic to the unofficial 
6.5.0-46-generic from Mehmet Basaran (mehmetbasaran) only removed the 
backchannel error from the client,
+   but the server still hangs with "rcu: INFO: rcu_sched self-detected stall 
on CPU"
  - changing the client kernel from 6.8.0-40-generic back to 6.5.0-44-generic 
does not solved the problem that user can not login after the crash:
-       client: 
-               kernel: [29361.795714] INFO: task python3:107226 blocked for 
more than 120 seconds.
+  client:
+   kernel: [29361.795714] INFO: task python3:107226 blocked for more than 120 
seconds.
  - changing also the server kernel back from 5.15.0-122-generic to 
5.15.0-112-generic only changed the error message but not the stability:
-       server:
-               kernel: [82884.774039] INFO: task split:8351 blocked for more 
than 120 seconds.
+  server:
+   kernel: [82884.774039] INFO: task split:8351 blocked for more than 120 
seconds.
  
  Our setup:
  - virtualized NFS 4.2 server with Ubuntu 22.04.5 LTS and kernel 
5.15.0-122-generic
  - virtualized NFS clients with Ubuntu 22.04.5 LTS and kernel 6.8.0-40-generic 
or kernel 6.8.0-45-generic
  - /etc/exports :  /mnt/home  
nfsclient(sec=krb5,rw,root_squash,sync,no_subtree_check)
  - /etc/fstab :  nfsserver:/mnt/home /home   nfs    
vers=4.2,rw,soft,sec=krb5,proto=tcp  0  0
  - apt info nfs-common : Version: 1:2.6.1-1ubuntu1.2
  
  # NFS Server error message with mainline kernel 5.15.0-122-generic :
- Sep 30 01:15:51 export2.iap-kborn.de kernel: rcu: INFO: rcu_sched 
self-detected stall on CPU
- Sep 30 01:15:51 export2.iap-kborn.de kernel: rcu:         54-....: (14998 
ticks this GP) idle=2db/1/0x4000000000000000 softirq=32173387/32173387 fqs=7449
- Sep 30 01:15:51 export2.iap-kborn.de kernel:         (t=15000 jiffies 
g=144775177 q=49782)
- Sep 30 01:15:51 export2.iap-kborn.de kernel: NMI backtrace for cpu 54
- Sep 30 01:15:51 export2.iap-kborn.de kernel: CPU: 54 PID: 153154 Comm: 
kworker/u480:36 Not tainted 5.15.0-122-generic #132-Ubuntu
- Sep 30 01:15:51 export2.iap-kborn.de kernel: Hardware name: Microsoft 
Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 
12/17/2019
+ Sep 30 01:15:51 nfs-server.domain.de kernel: rcu: INFO: rcu_sched 
self-detected stall on CPU
+ Sep 30 01:15:51 nfs-server.domain.de kernel: rcu:         54-....: (14998 
ticks this GP) idle=2db/1/0x4000000000000000 softirq=32173387/32173387 fqs=7449
+ Sep 30 01:15:51 nfs-server.domain.de kernel:         (t=15000 jiffies 
g=144775177 q=49782)
+ Sep 30 01:15:51 nfs-server.domain.de kernel: NMI backtrace for cpu 54
+ Sep 30 01:15:51 nfs-server.domain.de kernel: CPU: 54 PID: 153154 Comm: 
kworker/u480:36 Not tainted 5.15.0-122-generic #132-Ubuntu
+ Sep 30 01:15:51 nfs-server.domain.de kernel: Hardware name: Microsoft 
Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 
12/17/2019
  Sep 30 01:15:51 export2.iap-kborn.de kernel: Workqueue: rpciod 
rpc_async_schedule [sunrpc]
- Sep 30 01:15:51 export2.iap-kborn.de kernel: Call Trace:
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  <IRQ>
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  show_stack+0x52/0x5c
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  dump_stack_lvl+0x4a/0x63
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  dump_stack+0x10/0x16
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  nmi_cpu_backtrace.cold+0x4d/0x93
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  ? lapic_can_unplug_cpu+0x90/0x90
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  
nmi_trigger_cpumask_backtrace+0xec/0x100
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  
arch_trigger_cpumask_backtrace+0x19/0x20
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  
trigger_single_cpu_backtrace+0x44/0x4f
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  rcu_dump_cpu_stacks+0x102/0x149
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  print_cpu_stall.cold+0x2f/0xe2
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  check_cpu_stall+0x1d8/0x270
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  rcu_sched_clock_irq+0x9a/0x250
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  update_process_times+0x94/0xd0
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  tick_sched_handle+0x29/0x70
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  tick_sched_timer+0x6f/0x90
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  ? tick_sched_do_timer+0xa0/0xa0
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  __hrtimer_run_queues+0x104/0x230
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  ? read_hv_clock_tsc_cs+0x9/0x30
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  hrtimer_interrupt+0x101/0x220
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  hv_stimer0_isr+0x1d/0x30
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  
__sysvec_hyperv_stimer0+0x2f/0x70
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  sysvec_hyperv_stimer0+0x7b/0x90
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  </IRQ>
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  <TASK>
- Sep 30 01:15:51 export2.iap-kborn.de kernel:  
asm_sysvec_hyperv_stimer0+0x1b/0x20
- Sep 30 01:15:51 export2.iap-kborn.de kernel: RIP: 
0010:read_hv_clock_tsc+0x1b/0x60
+ Sep 30 01:15:51 nfs-server.domain.de kernel: Call Trace:
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  <IRQ>
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  show_stack+0x52/0x5c
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  dump_stack_lvl+0x4a/0x63
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  dump_stack+0x10/0x16
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  nmi_cpu_backtrace.cold+0x4d/0x93
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  ? lapic_can_unplug_cpu+0x90/0x90
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  
nmi_trigger_cpumask_backtrace+0xec/0x100
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  
arch_trigger_cpumask_backtrace+0x19/0x20
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  
trigger_single_cpu_backtrace+0x44/0x4f
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  rcu_dump_cpu_stacks+0x102/0x149
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  print_cpu_stall.cold+0x2f/0xe2
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  check_cpu_stall+0x1d8/0x270
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  rcu_sched_clock_irq+0x9a/0x250
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  update_process_times+0x94/0xd0
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  tick_sched_handle+0x29/0x70
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  tick_sched_timer+0x6f/0x90
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  ? tick_sched_do_timer+0xa0/0xa0
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  __hrtimer_run_queues+0x104/0x230
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  ? read_hv_clock_tsc_cs+0x9/0x30
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  hrtimer_interrupt+0x101/0x220
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  hv_stimer0_isr+0x1d/0x30
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  
__sysvec_hyperv_stimer0+0x2f/0x70
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  sysvec_hyperv_stimer0+0x7b/0x90
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  </IRQ>
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  <TASK>
+ Sep 30 01:15:51 nfs-server.domain.de kernel:  
asm_sysvec_hyperv_stimer0+0x1b/0x20
+ Sep 30 01:15:51 nfs-server.domain.de kernel: RIP: 
0010:read_hv_clock_tsc+0x1b/0x6

** Description changed:

  Since August 19th we have been struggling with irregular crashes on our
  NFS server.
  
  Our experiences with NFS server crashes are:
  - We were able to reproduce the crashes in our production and test 
environments with NFS4.2. Rarely after minutes, sometimes after hours or days, 
but sometimes not at all,
    as we often stopped the experiments after 12 to 24 hours.
  - We have not yet been able to reproduce a crash between a bare metal NFS 
server and a bare metal NFS client, but between a bare metal NFS server and a 
virtualized client with NFS4.2.
  - we could not reproduce a crash with NFS vers=4.0 up to now
  - we now running NFS vers=4.1 since some hours to see if this helps to get 
the system stable
  - the crashs happens with or without GSSPROXY
  - before Sept 15 the kernel back on the NFS server trace allways started with:
   watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [kworker/u483:0:8805]
   and after Sept 15 with:
   rcu: INFO: rcu_sched self-detected stall on CPU
  - changing the kernel on the client from 6.8.0-40-generic to the unofficial 
6.5.0-46-generic from Mehmet Basaran (mehmetbasaran) only removed the 
backchannel error from the client,
    but the server still hangs with "rcu: INFO: rcu_sched self-detected stall 
on CPU"
  - changing the client kernel from 6.8.0-40-generic back to 6.5.0-44-generic 
does not solved the problem that user can not login after the crash:
   client:
    kernel: [29361.795714] INFO: task python3:107226 blocked for more than 120 
seconds.
  - changing also the server kernel back from 5.15.0-122-generic to 
5.15.0-112-generic only changed the error message but not the stability:
   server:
    kernel: [82884.774039] INFO: task split:8351 blocked for more than 120 
seconds.
  
  Our setup:
  - virtualized NFS 4.2 server with Ubuntu 22.04.5 LTS and kernel 
5.15.0-122-generic
  - virtualized NFS clients with Ubuntu 22.04.5 LTS and kernel 6.8.0-40-generic 
or kernel 6.8.0-45-generic
  - /etc/exports :  /mnt/home  
nfsclient(sec=krb5,rw,root_squash,sync,no_subtree_check)
  - /etc/fstab :  nfsserver:/mnt/home /home   nfs    
vers=4.2,rw,soft,sec=krb5,proto=tcp  0  0
  - apt info nfs-common : Version: 1:2.6.1-1ubuntu1.2
  
  # NFS Server error message with mainline kernel 5.15.0-122-generic :
  Sep 30 01:15:51 nfs-server.domain.de kernel: rcu: INFO: rcu_sched 
self-detected stall on CPU
  Sep 30 01:15:51 nfs-server.domain.de kernel: rcu:         54-....: (14998 
ticks this GP) idle=2db/1/0x4000000000000000 softirq=32173387/32173387 fqs=7449
  Sep 30 01:15:51 nfs-server.domain.de kernel:         (t=15000 jiffies 
g=144775177 q=49782)
  Sep 30 01:15:51 nfs-server.domain.de kernel: NMI backtrace for cpu 54
  Sep 30 01:15:51 nfs-server.domain.de kernel: CPU: 54 PID: 153154 Comm: 
kworker/u480:36 Not tainted 5.15.0-122-generic #132-Ubuntu
  Sep 30 01:15:51 nfs-server.domain.de kernel: Hardware name: Microsoft 
Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 
12/17/2019
- Sep 30 01:15:51 export2.iap-kborn.de kernel: Workqueue: rpciod 
rpc_async_schedule [sunrpc]
+ Sep 30 01:15:51 nfs-server.domain.de kernel: Workqueue: rpciod 
rpc_async_schedule [sunrpc]
  Sep 30 01:15:51 nfs-server.domain.de kernel: Call Trace:
  Sep 30 01:15:51 nfs-server.domain.de kernel:  <IRQ>
  Sep 30 01:15:51 nfs-server.domain.de kernel:  show_stack+0x52/0x5c
  Sep 30 01:15:51 nfs-server.domain.de kernel:  dump_stack_lvl+0x4a/0x63
  Sep 30 01:15:51 nfs-server.domain.de kernel:  dump_stack+0x10/0x16
  Sep 30 01:15:51 nfs-server.domain.de kernel:  nmi_cpu_backtrace.cold+0x4d/0x93
  Sep 30 01:15:51 nfs-server.domain.de kernel:  ? lapic_can_unplug_cpu+0x90/0x90
  Sep 30 01:15:51 nfs-server.domain.de kernel:  
nmi_trigger_cpumask_backtrace+0xec/0x100
  Sep 30 01:15:51 nfs-server.domain.de kernel:  
arch_trigger_cpumask_backtrace+0x19/0x20
  Sep 30 01:15:51 nfs-server.domain.de kernel:  
trigger_single_cpu_backtrace+0x44/0x4f
  Sep 30 01:15:51 nfs-server.domain.de kernel:  rcu_dump_cpu_stacks+0x102/0x149
  Sep 30 01:15:51 nfs-server.domain.de kernel:  print_cpu_stall.cold+0x2f/0xe2
  Sep 30 01:15:51 nfs-server.domain.de kernel:  check_cpu_stall+0x1d8/0x270
  Sep 30 01:15:51 nfs-server.domain.de kernel:  rcu_sched_clock_irq+0x9a/0x250
  Sep 30 01:15:51 nfs-server.domain.de kernel:  update_process_times+0x94/0xd0
  Sep 30 01:15:51 nfs-server.domain.de kernel:  tick_sched_handle+0x29/0x70
  Sep 30 01:15:51 nfs-server.domain.de kernel:  tick_sched_timer+0x6f/0x90
  Sep 30 01:15:51 nfs-server.domain.de kernel:  ? tick_sched_do_timer+0xa0/0xa0
  Sep 30 01:15:51 nfs-server.domain.de kernel:  __hrtimer_run_queues+0x104/0x230
  Sep 30 01:15:51 nfs-server.domain.de kernel:  ? read_hv_clock_tsc_cs+0x9/0x30
  Sep 30 01:15:51 nfs-server.domain.de kernel:  hrtimer_interrupt+0x101/0x220
  Sep 30 01:15:51 nfs-server.domain.de kernel:  hv_stimer0_isr+0x1d/0x30
  Sep 30 01:15:51 nfs-server.domain.de kernel:  
__sysvec_hyperv_stimer0+0x2f/0x70
  Sep 30 01:15:51 nfs-server.domain.de kernel:  sysvec_hyperv_stimer0+0x7b/0x90
  Sep 30 01:15:51 nfs-server.domain.de kernel:  </IRQ>
  Sep 30 01:15:51 nfs-server.domain.de kernel:  <TASK>
  Sep 30 01:15:51 nfs-server.domain.de kernel:  
asm_sysvec_hyperv_stimer0+0x1b/0x20
  Sep 30 01:15:51 nfs-server.domain.de kernel: RIP: 
0010:read_hv_clock_tsc+0x1b/0x6

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083502

Title:
  NFS4.2 crashes with high IO load after some hours

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2083502/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to