Public bug reported:
Environment:
Ubuntu 24.04.3 on HWE kernel 6.17.0-14-generic #14~24.04.1-Ubuntu
Host is joined to AD domain with sssd, sssd is cifs idmap provider
root has Kerberos TGT issued via k5start using static keytab w/ service
account, to facilitate CIFS mount at boot time
krb5 ccache is stored in keyring
CIFS share is mounted with vers=3.1.1,sec=krb5i,multiuser,cifsacl - root is
mapped to service account
some user home folders are stored on CIFS mount
This setup has been working (more or less, there are other unrelated bugs in
cifs.upcall racing that I have not yet reported, but not relevant here).
After recently adding a frequently-running task to sync files between the cifs
share and a tmpfs on the Linux host (using `rclone bisync`) the system
intermittently becomes non-responsive as systemd/pid1 becomes hung in a fstat
call to some file on the CIFS share. With systemd not responding and IO to the
CIFS share blocked, the vast majority of tools become unusable (no `strace`,
`lsof`, not even `ps` works) for several minutes.
Hang oops:
INFO: task systemd:1 blocked for more than 245 seconds.
Not tainted 6.17.0-14-generic #14~24.04.1-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd state:D stack:0 pid:1 tgid:1 ppid:0
task_flags:0x400100 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x30d/0x7a0
schedule+0x27/0x90
schedule_timeout+0x104/0x110
__wait_for_common+0x98/0x180
? __pfx_schedule_timeout+0x10/0x10
wait_for_completion_state+0x21/0x50
call_usermodehelper_exec+0x181/0x1b0
call_sbin_request_key+0x343/0x500
construct_key_and_link+0x14e/0x1b0
request_key_and_link+0x1d5/0x200
? __pfx_key_default_cmp+0x10/0x10
? __pfx_keyring_search_iterator+0x10/0x10
request_key_tag+0x48/0xc0
sid_to_id+0xe4/0x350 [cifs]
parse_sec_desc+0x8e/0x320 [cifs]
cifs_acl_to_fattr+0x14b/0x1f0 [cifs]
cifs_get_fattr+0x35f/0x6d0 [cifs]
? rmqueue.isra.0+0x13bc/0x1a20
cifs_get_inode_info+0x60/0x140 [cifs]
cifs_revalidate_dentry_attr+0x1a9/0x3d0 [cifs]
cifs_getattr+0x16c/0x250 [cifs]
vfs_getattr_nosec+0xb9/0x110
vfs_fstat+0x4e/0xc0
__do_sys_newfstat+0x3d/0x80
__x64_sys_newfstat+0x15/0x20
x64_sys_call+0x219b/0x2680
do_syscall_64+0x80/0xa30
? __x64_sys_openat+0x54/0xa0
? arch_exit_to_user_mode_prepare.isra.0+0xd/0xe0
? do_syscall_64+0xb6/0xa30
? __fput+0x1a2/0x2d0
? kmem_cache_free+0x43a/0x470
? __fput+0x1a2/0x2d0
? fput_close_sync+0x3d/0xa0
? __x64_sys_close+0x3e/0x90
? arch_exit_to_user_mode_prepare.isra.0+0xd/0xe0
? do_syscall_64+0xb6/0xa30
? do_syscall_64+0xb6/0xa30
? do_wp_page+0x1d4/0x640
? handle_pte_fault+0x1ec/0x200
? __handle_mm_fault+0x5ba/0x740
? count_memcg_events+0xf0/0x1e0
? handle_mm_fault+0x237/0x370
? do_user_addr_fault+0x1d2/0x8d0
? arch_exit_to_user_mode_prepare.isra.0+0xd/0x100
? irqentry_exit_to_user_mode+0x2d/0x1d0
? irqentry_exit+0x43/0x50
? clear_bhb_loop+0x30/0x80
? clear_bhb_loop+0x30/0x80
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x745f751173bb
RSP: 002b:00007fffcf1da368 EFLAGS: 00000202 ORIG_RAX: 0000000000000005
RAX: ffffffffffffffda RBX: 000000000000004a RCX: 0000745f751173bb
RDX: 0000000000000000 RSI: 00007fffcf1da490 RDI: 000000000000004a
RBP: 00007fffcf1da680 R08: 000060542ce71010 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007fffcf1da400
R13: 0000000000000000 R14: 000060542d0ec9e0 R15: 0000000000000010
</TASK>
from the stack trace, it looks like the CIFS module is calling back to userland
to execute `/sbin/request-key`, which is normal and expected behaviour, but for
some reason the `request-key` invocation hangs for several minutes. I am unsure
if it eventually completes or times out, but the system does return to normal
behaviour, with IO to CIFS share working as expected. Manually invoking `rclone
bisync` does not reproduce the behaviour, I can stress the cifs mount fairly
heavily without any problems whatsoever.
Network issues between the cifs host and the Linux host are unlikely, as
they're both colocated on the same hypervisor in this case. I suspected maybe
sssd was having issues communicating with AD, but that also seems to not be the
problem.
Please let me know if there is any other diagnostic information that
could be useful to figure out what's going on here, I am unfortunately
at a loss without being able to run any system introspection while the
hang is ongoing.
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2142726
Title:
systemd/pid1 hangs while calling fstat on file in cifs mount
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2142726/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs