Hi.

I'm new to the boards, but I couldn't find my question already answered. I hope 
I'm not duplicating. But can someone explain to me how i/o operations that have 
are flagged as "must succeed" don't cause deadlocks when there are errors in 
the i/o path?

As far as I can tell, the ZIO only broadcasts a wake up to "zio->io_cv" in the 
zio_done stage. But, the zio_assess stage will return a ZIO_PIPELINE_STOP if 
you have an error on a "must succeed" operation.

My specific situation is an experimental redundancy vdev that deadlocks when I 
offline one of the leaf vdevs underneath it. In my test case, there are four 
leaf vdevs of which one is parity. One of the leafs has errors so when I 
offline a good one, it should report
that offline-ing can't be done under the circumstances. For some reason, the 
erroneous vdev is just reporting checksum errors, and zfs tries to go ahead 
with the offlining.

If you look at vdev_offline, vdev_offline tries to offline the vdev (leaf), 
close the vdev_top, then reopen the vdev_top. My code succeeds at this point, 
so it thinks the offlining worked.

But then, at the end of vdev_offline, it tries to do a spa_vdev_exit which has 
an nvlist_sync that does a read with the MUSTSUCCEED flag. At this point, it 
tries to read from the vdev, discovers that it cannot, and just hangs.

Obviously I need to improve my error reporting so it never gets to the 
spa_vdev_exit without an error, but I just cannot believe that zfs is supposed 
to hang if you try to do a "must succeed" call that fails. I'm certain that 
there is some other thing I'm doing wrong here, but I cannot seem to figure out 
what it is.

SJN
-- 
This message posted from opensolaris.org

Reply via email to