Some additional info about the deadlock:

crash> bt 16588
PID: 16588  TASK: ffff9ffd7f332b00  CPU: 1   COMMAND: "bcache_allocato"
    [exception RIP: bch_crc64+57]
    RIP: ffffffffc093b2c9  RSP: ffffab9585767e28  RFLAGS: 00000286
    RAX: f1f51403756de2bd  RBX: 0000000000000000  RCX: 0000000000000065
    RDX: 0000000000000065  RSI: ffff9ffd63980000  RDI: ffff9ffd63925346
    RBP: ffffab9585767e28   R8: ffffffffc093db60   R9: ffffab9585739000
    R10: 000000000000007f  R11: 000000001ffef001  R12: 0000000000000000
    R13: 0000000000000008  R14: ffff9ffd63900000  R15: ffff9ffd683d0000
    CS: 0010  SS: 0018
 #0 [ffffab9585767e30] bch_prio_write at ffffffffc09325c0 [bcache]
 #1 [ffffab9585767eb0] bch_allocator_thread at ffffffffc091bdc5 [bcache]
 #2 [ffffab9585767f08] kthread at ffffffffa80b2481
 #3 [ffffab9585767f50] ret_from_fork at ffffffffa8a00205

crash> bt 14658
PID: 14658  TASK: ffff9ffd7a9f0000  CPU: 0   COMMAND: "python3"
 #0 [ffffab958380bb48] __schedule at ffffffffa89ae441
 #1 [ffffab958380bbe8] schedule at ffffffffa89aea7c
 #2 [ffffab958380bbf8] bch_bucket_alloc at ffffffffc091c370 [bcache]
 #3 [ffffab958380bc68] __bch_bucket_alloc_set at ffffffffc091c5ce [bcache]
 #4 [ffffab958380bcb8] bch_bucket_alloc_set at ffffffffc091c66e [bcache]
 #5 [ffffab958380bcf8] __uuid_write at ffffffffc0931b69 [bcache]
 #6 [ffffab958380bda0] bch_uuid_write at ffffffffc0931f76 [bcache]
 #7 [ffffab958380bdc0] __cached_dev_store at ffffffffc0937c08 [bcache]
 #8 [ffffab958380be20] bch_cached_dev_store at ffffffffc0938309 [bcache]
 #9 [ffffab958380be50] sysfs_kf_write at ffffffffa830c97c
#10 [ffffab958380be60] kernfs_fop_write at ffffffffa830c3e5
#11 [ffffab958380bea0] __vfs_write at ffffffffa827e5bb
#12 [ffffab958380beb0] vfs_write at ffffffffa827e781
#13 [ffffab958380bee8] sys_write at ffffffffa827e9fc
#14 [ffffab958380bf30] do_syscall_64 at ffffffffa8003b03
#15 [ffffab958380bf50] entry_SYSCALL_64_after_hwframe at ffffffffa8a00081
    RIP: 00007faffc7bd154  RSP: 00007ffe307cbc88  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000008  RCX: 00007faffc7bd154
    RDX: 0000000000000008  RSI: 00000000011ce7f0  RDI: 0000000000000003
    RBP: 00007faffccb86c0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000100  R11: 0000000000000246  R12: 0000000000000003
    R13: 0000000000000000  R14: 00000000011ce7f0  R15: 0000000000f33e60
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

In this case the task "python3" (pid 14658) gets stuck in a wait that
never completes from bch_bucket_alloc(). The task should that should
resume "python3" from this wait is "bcache_allocator" (pid 16588), but
the resume never happens, because bcache_allocator is stuck in this
"retry_invalidate" busy loop:

static int bch_allocator_thread(void *arg)
{
...
retry_invalidate:
                allocator_wait(ca, ca->set->gc_mark_valid &&
                               !ca->invalidate_needs_gc);
                invalidate_buckets(ca);

                /*
                 * Now, we write their new gens to disk so we can start writing
                 * new stuff to them:
                 */
                allocator_wait(ca, !atomic_read(&ca->set->prio_blocked));
                if (CACHE_SYNC(&ca->set->sb)) {
                        /*
                         * This could deadlock if an allocation with a btree
                         * node locked ever blocked - having the btree node
                         * locked would block garbage collection, but here we're
                         * waiting on garbage collection before we invalidate
                         * and free anything.
                         *
                         * But this should be safe since the btree code always
                         * uses btree_check_reserve() before allocating now, and
                         * if it fails it blocks without btree nodes locked.
                         */
                        if (!fifo_full(&ca->free_inc))
                                goto retry_invalidate;

                        if (bch_prio_write(ca, false) < 0) {
                                ca->invalidate_needs_gc = 1;
                                wake_up_gc(ca->set);
                                goto retry_invalidate;
                        }
                }
        }
...

The exact code path is this: bch_prio_write() fails, because it calls
bch_bucket_alloc() that fails (out of free buckets), it's waking up the
garbage collector (trying to free up some buckets) and it goes back to
retry_invalidate, but it's not enough apprently; bch_prio_write() is
going to fail over and over again (due to no buckets available), unable
to break out of the busy loop => deadlock.

Looking better at the code it seems safe to resume the bcache_allocator
main loop when bch_prio_write() fails (still keeping the wake_up event
to the garbage collector), instead of going back to retry_invalidate.
This should give the allocator a better chance to free up some buckets,
possibly preventing the "out of buckets" deadlock condition.

I'm currently testing a kernel with this change, if I can't trigger any
deadlock in the next hour or so, I'll upload a new test kernel.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796292

Title:
  Tight timeout for bcache removal causes spurious failures

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to