On 26 Apr 2014, at 21:22, David Holland <dholland-t...@netbsd.org> wrote:
> On Tue, Apr 15, 2014 at 04:11:58PM +0000, Taylor R Campbell wrote: >> New diff at http://www.netbsd.org/~hannken/vnode-pass6-3.diff >> >> Plan to commit early wednesday ... >> >> I still don't think this approach is right. It makes a long-term copy >> of logic in getnewvnode (because this vcache(9) will not be applicable >> everywhere), there's no reason to use kpause or any new condvars when >> we already have one inside each vnode which we'll be using anyway in >> vget, and it still increases the overhead of every vnode using it. >> >> I don't object to the principle of VFS_LOAD_NODE, or VOP_LOAD, but at >> the moment I think it will cause greater divergence between file >> systems and maintenance headaches as a result. >> >> As an alternative, I propose the set of patches at >> <https://www.NetBSD.org/~riastradh/tmp/20140415/>, to do the >> following: >> [...] > > Wading into this after the fact, I see the following points: > > - Duplicating the getnewvnode logic is definitely bad; we have enough > cruft without adding new redundant but not quite equivalent code > paths. > > - It seems to me that in the long run, all of this baloney should be > hidden away inside the vfs layer; filesystems that use the vnode cache > should only need to call vcache_get, and the only thing that should > ever see a partially initialized vnode is VOP_LOAD and nothing outside > the vnode cache should ever see a vnode with refcount zero. Agreed. > This in > particular means that all the various forms of vget() should be hidden > away, as should getnewvnode. It seems to me like both of these patch > sets are steps in this direction, but we're far enough away from the > destination that it's hard to choose between them. > > - A while back (one or two rounds ago) I floated an idea that had > separate vnode key objects for the same reason they appear here: to > make insertion atomic. Riastradh convinced me at the time that this > wasn't necessary, so I dropped it and never got around to posting it > on tech-kern. However, it had a characteristic that I don't see here: > locks in the vnode keys. The idea of that was to introduce a level of > indirection so that nothing like VI_CHANGING was needed: while loading > a vnode, you unlock the table but leave the key locked, and that takes > care, in an orderly way, of making sure nobody else sees it. If we're > going to end up with vnode key objects, I think they should contain > locks in favor of having exposed or semi-exposed state flags. While I agree on atomic insertions being a mandatory property I don't see the need for a per key lock. As collisions are very rare wait and retry using hash table lock and key->vnode being NULL or a per table cv like I did before is sufficient (at least for now). > - There are still filesystems that do not use the vnode cache. Half > the problem with the legacy scheme we have is that it tries to provide > the expire half of the cache to all filesystems, even those that don't > use or need the lookup half of the cache. This is fundamentally wrong. > I think rather than trying to continue with this folly we should do > something different that keeps these filesystems separate. > > - Question 1: How much overhead would it really cause if we didn't > bother to cycle tmpfs vnodes "in" and "out" but just materialized them > along with the tmpfs_node and kept them around until someone destroys > the tmpfs_node? vnodes have a fair amount of slop in them, but they > aren't that big and if you're using tmpfs you presumably have enough > memory to keep whole files in RAM anyway. In this case we could just > provide a constructor and destructor for non-cache vnodes, decouple > that from the lifecycle logic (and firewall it off with plenty of > assertions and such) and a lot of the silly goes away. Once we reach a point where this decision has to be made it may be better to use the vnode cache for tmpfs than to make tmpfs the only fs being different. > - Question 2: The other obvious candidates for not using the vnode > cache are virtual constructs like kernfs and procfs. ISTM that > "loading" and "unloading" their vnodes is equally pointless and that > the same solution applies. The question is: are there any other fses > that wouldn't use the vnode cache for some other reason, such that > this reasoning wouldn't apply? I can't think of any, and it seems like > anything where there are genuine load and unload operations for vnodes > vnodes should be using the cache, but that doesn't prove anything. Kernfs and procfs already use a hash table and are candidates for the vnode cache. > Also, it should be vcache_get(), not intern; with all "intern" > operations I can think of, you pass in the object to be interned, > not just a key for it. If we had vcache_intern(), the model would be > that you consed up a vnode, then passed it (along with the key) to > vcache_intern(), which would then either insert it into the cache and > return it, or destroy it and return the existing equivalent one. This > is not really what we want. Agreed. And we may need vcache_lookup to lookup but not load a node too. > and FWIW, I favor VOP_LOAD over VFS_LOADVNODE. I prefer a fs operation over a vnode operation to initialise a vnode. Using a vnode operation we had to either pass the vnode operations vector to vache_get() which is ugly or we had to set it from "*mp->mnt_op->vfs_opv_descs[0]->opv_desc_vector_p" before calling VOP_LOAD(). To me it looks better to have the fs setup the operations vector (which may not be the default one). Could we finally agree on this interface: vcache_get(mp, key, key_len, vpp) to lookup and possibly load a vnode. vcache_lookup(mp, key, key_len, vpp) to lookup a vnode. vcache_remove(mp, key, key_len) to remove a vnode from the cache. VFS_LOAD_NODE(mp, vp, key, key_len, new_key) to initialise a vnode. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)