Date: Fri, 28 Feb 2014 17:22:11 +0100 From: "J. Hannken-Illjes" <hann...@eis.cs.tu-bs.de>
On Feb 28, 2014, at 4:41 PM, Taylor R Campbell <campbell+netbsd-tech-k...@mumble.net> wrote: > Actually, I think I may want to change that, so that vget can be done > under a pserialized read -- i.e., atomics only, no locks or waits. I > took a look at dholland's message from September and started to write > a generic inode->vnode cache with pserialize, and realized vget needs > to happen under it. But at least the explicit states will help to > clarify the model for now, and will be a good fallback if I give up > before making a pserialized inode->vnode cache. I don't see a many reader / rare writer case here. Pserialize is very expensive for the writer case (at the moment you even get a minimum of 2 HZ for a pserialize_perform -- but this is a bug). The many-reader case is lookups of cached vnodes; the writer case is fetching a vnode you don't have already and must read from disk. It would be nice to make the whole namei path for lookups lock-free. Currently we can't do that because VOP_LOOKUP requires the vnode lock, but we can dream! Anyway, as I said, this pserialization might be a failed experiment. We'll see. So far I think I can simplify the draft further by forcing it to work with pserialize, though. > I think this is actually OK. If you're a file system, and you're > trying to grab one of your own vnodes, it's no good to get one whose > identity will soon be changed to deadfs -- you need to make a new > vnode for the inode you were looking for. Nor is it any good to try > to work with one that vfs has chosen to destroy -- it's too late to > recover that one. So in both cases immediately returning ENOENT is > the right thing. It is better not to have two vnodes (on dying, one creating) representing the same inode in parallel. Hmm, good point. I wonder whether we can fix this by creating a concept of inode number reservations, which I think we'll want anyway to avoid the unlock/lookup ../lock race. tmpfs has this already: it's the tmpfs_node tn_vlock. The idea is that you take this lock when you want to get or create a vnode for a given `inode' (tmpfs node), and that excludes anyone else from getting that. Of course, for inode numbers, we'd need a somewhat hairier data structure, because we don't have a nice struct to store a mutex into. But if we had that, we perhaps we could make vnode destruction reserve the inode number until VOP_RECLAIM is done and it is safe to create a new vnode for that inode number. > Hmm. This vn_lock code may be broken. But revocation is currently > also broken because it doesn't exclude other vnode operations, and if > we fixed it in the obvious way so that VOP_REVOKE or vrevoke took the > vnode lock, then xyzfs and deadfs would have to use genfs_lock anyway > in order for it to all work out. So even if we use the LK_RETRY hack > I believe there's more that we'll need to fix to enable revocation to > mix with non-genfs_lock VOP_LOCKs. Remember, VOP_LOCK is a file system operation. Every file system is free to use its own locking -- all file systems using genfs_lock() is just the status quo. Right, that was my point. Even if we use the LK_RETRY hack, there is more we'll need to do to get non-genfs_lock VOP_LOCKs working with revoke -- if for no other reason than that revoke must exclude other vnode operations, we have no way to do that but VOP_LOCK, and deadfs VOP_UNLOCK has to wake the VOP_LOCKs waiting for that vnode. > Vrecycle() has to change -- I'm currently working on it. > > What is it supposed to do? I don't understand from the man page > description or the uses in-tree. While the name is bad it gets rid of inactive vnodes while cleaning up a file system during unmount or reload after fsck. Why do we do vrecycle and not just vrevoke?