Since some point in the current 5.8 cycle, iscsid sometimes crashes
and fails to come up after a reboot.

Here is a sample BUG:

 kernel: BUG: scheduling while atomic: iscsid/763/0x00000200

 kernel: Call Trace:
 kernel:  dump_stack+0x6b/0x88
 kernel:  __schedule_bug.cold+0x4c/0x58
 kernel:  __schedule+0x646/0x800
 kernel:  schedule+0x4a/0xb0
 kernel:  __lock_sock+0x69/0x90
 kernel:  ? finish_wait+0x80/0x80
 kernel:  lock_sock_nested+0x4f/0x60
 kernel:  inet_getname+0x98/0xd0
 kernel:  iscsi_sw_tcp_conn_get_param+0x9b/0x110 [iscsi_tcp]
 kernel:  show_conn_ep_param_ISCSI_PARAM_CONN_ADDRESS+0x6d/0x90
[scsi_transport_iscsi]
 kernel:  dev_attr_show+0x16/0x40
 kernel:  sysfs_kf_seq_show+0x98/0xf0
 kernel:  seq_read+0xa8/0x420
 kernel:  vfs_read+0x9d/0x180
 kernel:  ksys_read+0x4f/0xc0
 kernel:  do_syscall_64+0x47/0x80
 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Not areas I'm familiar with, but looking at
iscsi_sw_tcp_conn_get_param, it uses this pattern:

    spin_lock_bh(&conn->session->frwd_lock)
    kernel_getsockname()/kernel_getpeername()
    spin_unlock_bh(&conn->session->frwd_lock)

.. and inet_getname since 1b66d253610c7 ("bpf: Add get{peer, sock}name
attach types for sock_addr") has a new BPF_CGROUP_RUN_SA_PROG_LOCK()
that will try to lock_sock the socket, while a spinlock is held.  I
suspect that when this needs to sleep, it triggers the BUG.

I cc'ed the author of that commit, not sure if any other lists would
be useful to cc.

Marc

Reply via email to