On Tue, Dec 04, 2012 at 05:31:59PM +0100, J. Hannken-Illjes wrote: > >> A suspended fs has the guarantee that no other thread will be inside > >> fstrans_suspend / fstrans_done of any vnode operation. > >> > >> Threads stuck permanently as in (c) are impossible to catch. > > > > ...doesn't that mean the (c) threads are going to be holding read > > locks, so the suspend will hang forever? > > Yes. > > > We have to assume there will be at least one such thread, as people > > don't generally attempt umount -f unless something's wedged. > > If forced unmounts have to work with vnode operations that will not > leave a fs the only solution will be to disable forced unmounts and > leave reboot as the last option. How could we safely determine if > a vnode operation just takes some time or is stuck forever? What should > we do with threads that are stuck forever?
You can't. That's why umount -f is currently unsafe. We can probably make it more safe (and as just pointed out elsewhere, linux-style lazy unmounts are safe and can often be used as a substitute) but making it fully safe will be difficult. The basic use case for umount -f is when you have processes hanging uninterruptibly waiting for a hard-mounted NFS server that's gone away and won't be coming back anytime soon. The other common case is when you accidentally ejected a removable device without unmounting it, and didn't notice until reinserting it isn't an option. Usually what causes you to notice is accessing the device and failing, and that often results in a stuck process. If there isn't a process hanging, generally you can clear away any processes holding the volume busy and do an ordinary umount. The third case I guess is when vnode refcount bugs have left you with a "busy" fs that nothing's actually using; in this case there won't be processes hanging; but by the time you've confirmed this, there also won't be processes using the fs, so the current scheme is more or less safe enough... > Looks like this thread is dead. No one beside David Holland is > interested and David objects. I take back my proposal. I think fstrans may not be the mechanism we want, but I don't think we should give up. > >> - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here > >> as they run part or all of the operation with vnode interlock held. > > > > Ugh. But this doesn't, for example, preclude atomically incrementing a > > per-cpu counter or setting a flag in the lwp structure. > > Sure, but we need a point in time where it is safe to mark a vnode dead > and that means to reclaim its inode or other fs-private state. This is > not just accounting -- we must be sure no thread is using fs-private data > and I don't see a way without suspending vnode operations. The way to do this is to divide the "dead" operation into two parts: first change the ops vector (or set a dead flag) so no new threads can enter, then when the threads already in the fs are gone or otherwise neutralized, delete the private data. Or we could just leave the private data. One of the advantages of setting a dead flag (and testing it in vnode_if.c) instead of changing the ops vector is that the vnode doesn't lose its attachment to the fs and can be cleaned up properly later if it ever gets released. As for "neutralized", one possible scheme is to mark the vnodes dead, wait for a few seconds, and then do something evil to the blocked threads still using the fs; that requires knowing which threads they are, which is the hard part. Any simple scheme will end up being a hotspot for cache contention. See the "kicking everybody out of the softc" thread from a couple years ago for one possible approach to that. > >> - Accessing devices and fifos through a file system cannot be tracked > >> at the vnode_if.c level. Take ufs_spec_read() for example: > >> > >> fstrans_start(...); > >> VTOI(vp)->i_flag |= IN_ACCESS; > >> fstrans_done(...); > >> return VOCALL(spec_vnodeop_p, ...) > >> > >> Here the VOCALL may sleep forever if the device has no data. > > > > This I don't understand. To the extent it's "in" the fs while it's > > doing this, vnode_if.c can know that, because it called ufs_spec_read > > and ufs_spec_read hasn't returned yet. To the extent that it's in > > specfs, specfs won't itself ever be umount -f'd, since it isn't > > mounted, so there's no danger. > > Ok, we had to use VOCALL rather than vnode_if.c and add the accounting > here. No, I don't see why. vnode_if.c calls ufs_spec_read(); until ufs_spec_read() returns, as far as tracking done in vnode_if.c is concerned, the thread doing it is still in the fs. > > If umount -f currently blats over the data that specfs uses, then it's > > currently not safe, but I don't see how it's different from any other > > similar case with ufs data. > > > > I expect I'm missing something. > > So another approach is to serialize vclean(), at least when cleaning > active vnodes, and have VOCALL() take care of the vnode currently being > cleaned. We would need just per-thread state then. This would still > not work for vnode operations stuck in the fs. > > Is this the direction you have in mind? No, because I don't understand. :( -- David A. Holland dholl...@netbsd.org