I have the same problem on my two machines that have identical setup.

ZFS on root and data (two pools), compression enabled, some subvolumes
are encrypted.

ubuntu 21.10
Kernel 5.13.0-20-generic

$ zfs --version
zfs-2.0.6-1ubuntu2
zfs-kmod-2.0.6-1ubuntu2

First panic happens after I log in with my OS user, this triggers
decryption of zfs subvolumes via PAM and voila:

VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, 
&zp->z_sa_hdl)) failed
PANIC at zfs_znode.c:339:zfs_znode_sa_init()
Showing stack for process 8821
CPU: 6 PID: 8821 Comm: Cache2 I/O Tainted: P           O      5.13.0-20-generic 
#20-Ubuntu
Hardware name: ASUS System Product Name/PRO H410T, BIOS 1401 07/27/2020
Call Trace:
 show_stack+0x52/0x58
 dump_stack+0x7d/0x9c
 spl_dumpstack+0x29/0x2b [spl]
 spl_panic+0xd4/0xfc [spl]
 ? queued_spin_unlock+0x9/0x10 [zfs]
 ? do_raw_spin_unlock+0x9/0x10 [zfs]
 ? __raw_spin_unlock+0x9/0x10 [zfs]
 ? dmu_buf_replace_user+0x65/0x80 [zfs]
 ? dmu_buf_set_user+0x13/0x20 [zfs]
 ? dmu_buf_set_user_ie+0x15/0x20 [zfs]
 zfs_znode_sa_init+0xd9/0xe0 [zfs]
…

The system itself is still usable but becomes unresponsive here and
there, on irregular basis. And then it goes on and on with messages like
this in dmesg:

INFO: task Cache2 I/O:8821 blocked for more than 1208 seconds.
      Tainted: P           O      5.13.0-20-generic #20-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:Cache2 I/O      state:D stack:    0 pid: 8821 ppid:  4247 flags:0x00000000
Call Trace:
 __schedule+0x268/0x680
 schedule+0x4f/0xc0
 spl_panic+0xfa/0xfc [spl]
 ? queued_spin_unlock+0x9/0x10 [zfs]
 ? do_raw_spin_unlock+0x9/0x10 [zfs]
 ? __raw_spin_unlock+0x9/0x10 [zfs]
 ? dmu_buf_replace_user+0x65/0x80 [zfs]
 ? dmu_buf_set_user+0x13/0x20 [zfs]
 ? dmu_buf_set_user_ie+0x15/0x20 [zfs]
 zfs_znode_sa_init+0xd9/0xe0 [zfs]
…

Processes that are suffering from being locked forever in 'D' state (ps
output second column) are usually firefox, gsd-housekeeping , sometimes
gnome-shell and, as in case above, find. I believe gnome-shell causes
nautilus to misbehave. What also sucks is that this seems to cause my
laptop to abort entering sleep mode with resource busy error,
recursively. So it would try to enter sleep, abort (there's a message in
syslog) and try again, until the battery depletes completely.

Adding `zfs.zfs_recover=1` to kernel boot parameter list maybe helps
(thank you https://launchpad.net/~jawn-smith). At least it prevented the
first zfs_node panic message from appearing in dmesg after login, but
this needs longer and more detailed observation under different loads.
Also, an open question remains whether having such kernel parameter for
regular use is appropriate.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

Status in Native ZFS for Linux:
  New
Status in linux package in Ubuntu:
  Invalid
Status in ubuntu-release-upgrader package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Impish:
  Fix Released
Status in ubuntu-release-upgrader source package in Impish:
  Confirmed
Status in zfs-linux source package in Impish:
  Fix Released

Bug description:
  Since today while running Ubuntu 21.04 Hirsute I started getting a ZFS
  panic in the kernel log which was also hanging Disk I/O for all
  Chrome/Electron Apps.

  I have narrowed down a few important notes:
  - It does not happen with module version 0.8.4-1ubuntu11 built and included 
with 5.8.0-29-generic

  - It was happening when using zfs-dkms 0.8.4-1ubuntu16 built with DKMS
  on the same kernel and also on 5.8.18-acso (a custom kernel).

  - For whatever reason multiple Chrome/Electron apps were affected,
  specifically Discord, Chrome and Mattermost. In all cases they seem
  (but I was unable to strace the processes so it was a bit hard ot
  confirm 100% but by deduction from /proc/PID/fd and the hanging ls)
  they seem hung trying to open files in their 'Cache' directory, e.g.
  ~/.cache/google-chrome/Default/Cache and ~/.config/Mattermost/Cache ..
  while the issue was going on I could not list that directory either
  "ls" would just hang.

  - Once I removed zfs-dkms only to revert to the kernel built-in
  version it immediately worked without changing anything, removing
  files, etc.

  - It happened over multiple reboots and kernels every time, all my
  Chrome apps weren't working but for whatever reason nothing else
  seemed affected.

  - It would log a series of spl_panic dumps into kern.log that look like this:
  Dec  2 12:36:42 optane kernel: [   72.857033] VERIFY(0 == 
sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) 
failed
  Dec  2 12:36:42 optane kernel: [   72.857036] PANIC at 
zfs_znode.c:335:zfs_znode_sa_init()

  I could only find one other google reference to this issue, with 2 other 
users reporting the same error but on 20.04 here:
  https://github.com/openzfs/zfs/issues/10971

  - I was not experiencing the issue on 0.8.4-1ubuntu14 and fairly sure
  it was working on 0.8.4-1ubuntu15 but broken after upgrade to
  0.8.4-1ubuntu16. I will reinstall those zfs-dkms versions to verify
  that.

  There were a few originating call stacks but the first one I hit was

  Call Trace:
   dump_stack+0x74/0x95
   spl_dumpstack+0x29/0x2b [spl]
   spl_panic+0xd4/0xfc [spl]
   ? sa_cache_constructor+0x27/0x50 [zfs]
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? dmu_buf_set_user_ie+0x54/0x80 [zfs]
   zfs_znode_sa_init+0xe0/0xf0 [zfs]
   zfs_znode_alloc+0x101/0x700 [zfs]
   ? arc_buf_fill+0x270/0xd30 [zfs]
   ? __cv_init+0x42/0x60 [spl]
   ? dnode_cons+0x28f/0x2a0 [zfs]
   ? _cond_resched+0x19/0x40
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? aggsum_add+0x153/0x170 [zfs]
   ? spl_kmem_alloc_impl+0xd8/0x110 [spl]
   ? arc_space_consume+0x54/0xe0 [zfs]
   ? dbuf_read+0x4a0/0xb50 [zfs]
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? dnode_rele_and_unlock+0x5a/0xc0 [zfs]
   ? _cond_resched+0x19/0x40
   ? mutex_lock+0x12/0x40
   ? dmu_object_info_from_dnode+0x84/0xb0 [zfs]
   zfs_zget+0x1c3/0x270 [zfs]
   ? dmu_buf_rele+0x3a/0x40 [zfs]
   zfs_dirent_lock+0x349/0x680 [zfs]
   zfs_dirlook+0x90/0x2a0 [zfs]
   ? zfs_zaccess+0x10c/0x480 [zfs]
   zfs_lookup+0x202/0x3b0 [zfs]
   zpl_lookup+0xca/0x1e0 [zfs]
   path_openat+0x6a2/0xfe0
   do_filp_open+0x9b/0x110
   ? __check_object_size+0xdb/0x1b0
   ? __alloc_fd+0x46/0x170
   do_sys_openat2+0x217/0x2d0
   ? do_sys_openat2+0x217/0x2d0
   do_sys_open+0x59/0x80
   __x64_sys_openat+0x20/0x30

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to