PROBLEM It's possible for a parent zio to complete even though it has children which have not completed. This can result in the following panic:
> $C ffffff01809128c0 vpanic() ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80) ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80) ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0, ffffff3373370908) ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0) ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0) ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140) ffffff0180912b40 thread_start+8() > ::status debugging crash dump vmcore.2 (64-bit) from batfs0390 operating system: 5.11 joyent_20170911T171900Z (i86pc) image uuid: (not set) panic message: mutex_enter: bad mutex, lp=ffffff597dde7f80 owner=ffffff3c59b39480 thread=ffffff0180912c40 dump content: kernel pages only The problem is that dbuf_prefetch along with l2arc can create a zio tree which confuses the parent zio and allows it to complete with while children still exist. Here's the scenario: zio tree: pio |--- lio The parent zio, pio, has entered the zio_done stage and begins to check its children to see there are still some that have not completed. In zio_done(), the children are checked in the following order: zio_wait_for_children(zio, ZIO_CHILD_VDEV, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_GANG, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_DDT, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_LOGICAL, ZIO_WAIT_DONE) If pio, finds any child which has not completed then it stops executing and goes to sleep. Each call to zio_wait_for_children() will grab the io_lock while checking the particular child. In this scenario, the pio has completed the first call to zio_wait_for_children() to check for any ZIO_CHILD_VDEV children. Since the only zio in the zio tree right now is the logical zio, lio, then it completes that call and prepares to check the next child type. In the meantime, the lio completes and in its callback creates a child vdev zio, cio. The zio tree looks like this: zio tree: pio |--- lio |--- cio The lio then grabs the parent's io_lock and removes itself. zio tree: pio |--- cio The pio continues to run but has already completed its check for ZIO_CHILD_VDEV and will erroneously complete. When the child zio, cio, completes it will panic the system trying to reference the parent zio which has been destroyed. SOLUTION The fix is to rework the zio_wait_for_children() logic to accept an enum bitmask for all the children types that it's interested in checking. The io_lock will be held the entire time we check all the children types. Since the function now accepts an enum, a simple ZIO_CHILD() macro is provided to allow the conversion between the enum and the array index used by the io children logic. You can view, comment on, or merge this pull request online at: https://github.com/openzfs/openzfs/pull/505 -- Commit Summary -- * 8857 zio_remove_child() panic due to already destroyed parent zio -- File Changes -- M usr/src/uts/common/fs/zfs/sys/zio.h (11) M usr/src/uts/common/fs/zfs/zio.c (77) -- Patch Links -- https://github.com/openzfs/openzfs/pull/505.patch https://github.com/openzfs/openzfs/pull/505.diff -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/505 ------------------------------------------ openzfs-developer Archives: https://openzfs.topicbox.com/groups/developer/discussions/Td0adf1bfacf39ae0-M415e285ea79a91f21ea0c4e1 Powered by Topicbox: https://topicbox.com