Re: rototilling the vnode life cycle

Taylor R Campbell Fri, 28 Feb 2014 08:42:32 -0800

   Date: Fri, 28 Feb 2014 17:22:11 +0100
   From: "J. Hannken-Illjes" <hann...@eis.cs.tu-bs.de>

   On Feb 28, 2014, at 4:41 PM, Taylor R Campbell 
<campbell+netbsd-tech-k...@mumble.net> wrote:
   > Actually, I think I may want to change that, so that vget can be done
   > under a pserialized read -- i.e., atomics only, no locks or waits.  I
   > took a look at dholland's message from September and started to write
   > a generic inode->vnode cache with pserialize, and realized vget needs
   > to happen under it.  But at least the explicit states will help to
   > clarify the model for now, and will be a good fallback if I give up
   > before making a pserialized inode->vnode cache.

   I don't see a many reader / rare writer case here.  Pserialize is
   very expensive for the writer case (at the moment you even get a minimum
   of 2 HZ for a pserialize_perform -- but this is a bug).

The many-reader case is lookups of cached vnodes; the writer case is
fetching a vnode you don't have already and must read from disk.  It
would be nice to make the whole namei path for lookups lock-free.
Currently we can't do that because VOP_LOOKUP requires the vnode lock,
but we can dream!

Anyway, as I said, this pserialization might be a failed experiment.
We'll see.  So far I think I can simplify the draft further by forcing
it to work with pserialize, though.

   > I think this is actually OK.  If you're a file system, and you're
   > trying to grab one of your own vnodes, it's no good to get one whose
   > identity will soon be changed to deadfs -- you need to make a new
   > vnode for the inode you were looking for.  Nor is it any good to try
   > to work with one that vfs has chosen to destroy -- it's too late to
   > recover that one.  So in both cases immediately returning ENOENT is
   > the right thing.

   It is better not to have two vnodes (on dying, one creating)
   representing the same inode in parallel.

Hmm, good point.  I wonder whether we can fix this by creating a
concept of inode number reservations, which I think we'll want anyway
to avoid the unlock/lookup ../lock race.  tmpfs has this already: it's
the tmpfs_node tn_vlock.  The idea is that you take this lock when you
want to get or create a vnode for a given `inode' (tmpfs node), and
that excludes anyone else from getting that.

Of course, for inode numbers, we'd need a somewhat hairier data
structure, because we don't have a nice struct to store a mutex into.
But if we had that, we perhaps we could make vnode destruction reserve
the inode number until VOP_RECLAIM is done and it is safe to create a
new vnode for that inode number.

   > Hmm.  This vn_lock code may be broken.  But revocation is currently
   > also broken because it doesn't exclude other vnode operations, and if
   > we fixed it in the obvious way so that VOP_REVOKE or vrevoke took the
   > vnode lock, then xyzfs and deadfs would have to use genfs_lock anyway
   > in order for it to all work out.  So even if we use the LK_RETRY hack
   > I believe there's more that we'll need to fix to enable revocation to
   > mix with non-genfs_lock VOP_LOCKs.

   Remember, VOP_LOCK is a file system operation.  Every file system is
   free to use its own locking -- all file systems using genfs_lock()
   is just the status quo.

Right, that was my point.  Even if we use the LK_RETRY hack, there is
more we'll need to do to get non-genfs_lock VOP_LOCKs working with
revoke -- if for no other reason than that revoke must exclude other
vnode operations, we have no way to do that but VOP_LOCK, and deadfs
VOP_UNLOCK has to wake the VOP_LOCKs waiting for that vnode.

   >   Vrecycle() has to change -- I'm currently working on it.
   > 
   > What is it supposed to do?  I don't understand from the man page
   > description or the uses in-tree.

   While the name is bad it gets rid of inactive vnodes while cleaning
   up a file system during unmount or reload after fsck.

Why do we do vrecycle and not just vrevoke?

Re: rototilling the vnode life cycle

Reply via email to