Colin,

I'm afraid this is an ongoing issue. I have faced this in 4.4 and 4.13
kernels:

[  240.568146] INFO: task txg_sync:2065 blocked for more than 120 seconds.
[  240.568218]       Tainted: P           O    4.4.0-116-generic #140-Ubuntu
[  240.568268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  240.568290] txg_sync        D ffff880739e63aa8     0  2065      2 0x00000000
[  240.568294]  ffff880739e63aa8 ffff880739e63a88 ffff88081af75400 
ffff8800ac635400
[  240.568296]  ffff880739e64000 ffff88083ed972c0 7fffffffffffffff 
ffff88009fffc968
[  240.568297]  0000000000000001 ffff880739e63ac0 ffffffff8184ae45 
0000000000000000
[  240.568300] Call Trace:
[  240.568305]  [<ffffffff8184ae45>] schedule+0x35/0x80
[  240.568307]  [<ffffffff8184df98>] schedule_timeout+0x1b8/0x270
[  240.568310]  [<ffffffff810aec12>] ? default_wake_function+0x12/0x20
[  240.568313]  [<ffffffff810c6318>] ? __wake_up_common+0x58/0x90
[  240.568315]  [<ffffffff810f8cce>] ? ktime_get+0x3e/0xb0
[  240.568317]  [<ffffffff8184a5d4>] io_schedule_timeout+0xa4/0x110
[  240.568325]  [<ffffffffc0395c2c>] cv_wait_common+0xbc/0x140 [spl]
[  240.568327]  [<ffffffff810c69d0>] ? wake_atomic_t_function+0x60/0x60
[  240.568332]  [<ffffffffc0395d08>] __cv_wait_io+0x18/0x20 [spl]
[  240.568373]  [<ffffffffc0957634>] zio_wait+0x114/0x200 [zfs]
[  240.568397]  [<ffffffffc08e0d88>] dsl_pool_sync+0xb8/0x430 [zfs]
[  240.568424]  [<ffffffffc08fc7b6>] spa_sync+0x366/0xb30 [zfs]
[  240.568426]  [<ffffffff810aec12>] ? default_wake_function+0x12/0x20
[  240.568453]  [<ffffffffc090dc4a>] txg_sync_thread+0x3ba/0x630 [zfs]
[  240.568481]  [<ffffffffc090d890>] ? txg_delay+0x180/0x180 [zfs]
[  240.568486]  [<ffffffffc0390dc0>] ? __thread_exit+0x20/0x20 [spl]
[  240.568490]  [<ffffffffc0390e33>] thread_generic_wrapper+0x73/0x80 [spl]
[  240.568492]  [<ffffffff810a30f7>] kthread+0xe7/0x100
[  240.568494]  [<ffffffff810a3010>] ? kthread_create_on_node+0x1e0/0x1e0
[  240.568495]  [<ffffffff8184f422>] ret_from_fork+0x42/0x80
[  240.568497]  [<ffffffff810a3010>] ? kthread_create_on_node+0x1e0/0x1e0
[  240.568505] INFO: task lxd:2562 blocked for more than 120 seconds.
[  240.568524]       Tainted: P           O    4.4.0-116-generic #140-Ubuntu
[  240.568543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  240.568565] lxd             D ffff880734773ba0     0  2562      1 0x00000000
[  240.568567]  ffff880734773ba0 0000000000000246 ffff88081af71c00 
ffff88080d18f000
[  240.568569]  ffff880734774000 ffff8808144dba20 ffff8808144dba48 
ffff8808144dbae0
[  240.568571]  0000000000000000 ffff880734773bb8 ffffffff8184ae45 
ffff8808144dbad8
[  240.568573] Call Trace:
[  240.568575]  [<ffffffff8184ae45>] schedule+0x35/0x80
[  240.568579]  [<ffffffffc0395c7b>] cv_wait_common+0x10b/0x140 [spl]
[  240.568581]  [<ffffffff810c69d0>] ? wake_atomic_t_function+0x60/0x60
[  240.568586]  [<ffffffffc0395cc5>] __cv_wait+0x15/0x20 [spl]
[  240.568614]  [<ffffffffc090d4b5>] txg_wait_synced+0xe5/0x130 [zfs]
[  240.568643]  [<ffffffffc0952e4b>] zil_replay+0xdb/0x120 [zfs]
[  240.568673]  [<ffffffffc093fcef>] zfs_sb_setup+0x16f/0x180 [zfs]
[  240.568702]  [<ffffffffc0940b38>] zfs_domount+0x2c8/0x350 [zfs]
[  240.568731]  [<ffffffffc095ecf0>] ? zpl_kill_sb+0x20/0x20 [zfs]
[  240.568761]  [<ffffffffc095ed1c>] zpl_fill_super+0x2c/0x40 [zfs]
[  240.568763]  [<ffffffff8121714f>] mount_nodev+0x4f/0xa0
[  240.568792]  [<ffffffffc095f18f>] zpl_mount+0x4f/0x80 [zfs]
[  240.568794]  [<ffffffff81217e4d>] mount_fs+0x3d/0x170
[  240.568796]  [<ffffffff81234647>] vfs_kern_mount+0x67/0x110
[  240.568798]  [<ffffffff81236cff>] do_mount+0x25f/0xda0
[  240.568801]  [<ffffffff81237b7f>] SyS_mount+0x9f/0x100
[  240.568803]  [<ffffffff8184efc8>] entry_SYSCALL_64_fastpath+0x1c/0xbb
[  240.568810] INFO: task sync:7760 blocked for more than 120 seconds.
[  240.568828]       Tainted: P           O    4.4.0-116-generic #140-Ubuntu
[  240.568848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.

And I am still facing this with latest kernel and latest SPL and ZFS
code

inaddy@workstation:~$ sudo cat /proc/2832/stack
[<0>] __cv_timedwait_common+0x12c/0x170 [spl]
[<0>] __cv_timedwait_io+0x19/0x20 [spl]
[<0>] zio_wait+0x119/0x270 [zfs]
[<0>] dsl_pool_sync+0xb8/0x420 [zfs]
[<0>] spa_sync+0x43e/0xd30 [zfs]
[<0>] txg_sync_thread+0x271/0x440 [zfs]
[<0>] thread_generic_wrapper+0x74/0x90 [spl]
[<0>] kthread+0x121/0x140
[<0>] ret_from_fork+0x22/0x40
[<0>] 0xffffffffffffffff

Linux workstation 4.16.0-rc7 #1 SMP Wed Mar 28 09:21:33 UTC 2018 x86_64
x86_64 x86_64 GNU/Linux

ii  kmod-spl-4.16.0-rc7                        0.7.0-31                         
                          amd64        spl kernel module(s) for 4.16.0-rc7
ii  kmod-spl-devel                             0.7.0-31                         
                          amd64        spl kernel module(s) devel common
ii  kmod-spl-devel-4.16.0-rc7                  0.7.0-31                         
                          amd64        spl kernel module(s) devel for 4.16.0-rc7

ii  kmod-zfs-4.16.0-rc7                        0.7.0-390                        
                          amd64        zfs kernel module(s) for 4.16.0-rc7
ii  kmod-zfs-devel                             0.7.0-390                        
                          amd64        zfs kernel module(s) devel common
ii  kmod-zfs-devel-4.16.0-rc7                  0.7.0-390                        
                          amd64        zfs kernel module(s) devel for 4.16.0-rc7
ii  libzfs2                                    0.7.0-390                        
                          amd64        Native ZFS filesystem library for Linux
ii  libzfs2-devel                              0.7.0-390                        
                          amd64        Development headers
ii  zfs                                        0.7.0-390                        
                          amd64        Commands to control the kernel modules 
and libraries

I could map upstream bugs:

https://github.com/zfsonlinux/zfs/issues/5867
https://github.com/zfsonlinux/zfs/pull/6133
https://github.com/zfsonlinux/zfs/issues/6127

And they all look, somehow, related to the issue according to the stack
trace. Bug 5867, the closest, indicates there are 2 or 3 races that
would have been fixed by some commits already included in the upstream
version I'm testing.

I'll try to get a kdump from latest upstream kernel + latest upstream
zfs/spl modules to check if I can visualize what other task is racing
with "cv_wait_common" in all hung tasks.

With that, I'm re-opening this bug and will try to provide better
feedback soon.



** Bug watch added: Github Issue Tracker for ZFS #5867
   https://github.com/zfsonlinux/zfs/issues/5867

** Bug watch added: Github Issue Tracker for ZFS #6127
   https://github.com/zfsonlinux/zfs/issues/6127

** Changed in: zfs-linux (Ubuntu)
       Status: Won't Fix => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1654517

Title:
  ZFS I/O hangs for minutes

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1654517/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to