On 01 May 2014, at 18:06, David Holland <dholland-t...@netbsd.org> wrote:
> On Mon, Apr 28, 2014 at 10:56:44AM +0200, J. Hannken-Illjes wrote: >>> Wading into this after the fact, I see the following points: >>> >>> - Duplicating the getnewvnode logic is definitely bad; we have enough >>> cruft without adding new redundant but not quite equivalent code >>> paths. >>> >>> - It seems to me that in the long run, all of this baloney should be >>> hidden away inside the vfs layer; filesystems that use the vnode cache >>> should only need to call vcache_get, and the only thing that should >>> ever see a partially initialized vnode is VOP_LOAD and nothing outside >>> the vnode cache should ever see a vnode with refcount zero. >> >> Agreed. > > ok, in that case I think the only issue with the patch I looked at > Sunday (was it that long ago already?) is that it does duplicate the > getnewvnode logic -- if you've since fixed that I don't think I have > further substantive objections. What do you mean with "duplicate the getnewvnode logic" here? Vfs_busy()/vfs_unbusy() and vfs_insmntque()? >>> - A while back (one or two rounds ago) I floated an idea that had >>> separate vnode key objects for the same reason they appear here: to >>> make insertion atomic. Riastradh convinced me at the time that this >>> wasn't necessary, so I dropped it and never got around to posting it >>> on tech-kern. However, it had a characteristic that I don't see here: >>> locks in the vnode keys. The idea of that was to introduce a level of >>> indirection so that nothing like VI_CHANGING was needed: while loading >>> a vnode, you unlock the table but leave the key locked, and that takes >>> care, in an orderly way, of making sure nobody else sees it. If we're >>> going to end up with vnode key objects, I think they should contain >>> locks in favor of having exposed or semi-exposed state flags. >> >> While I agree on atomic insertions being a mandatory property I >> don't see the need for a per key lock. As collisions are very rare >> wait and retry using hash table lock and key->vnode being NULL >> or a per table cv like I did before is sufficient (at least for now). > > Having a lock and waiting on a cv and state bit are exactly > equivalent, except that if you have a lock, > - neither the state nor the mechanism is exposed. What state is exposed by "wait and retry"? > I would consider that an advantage, but it's not a killer. > > I don't think holding the table lock (or even a lock on part of the > table, like one hash chain) while loading a vnode is a good idea, > because a layer vnode is likely to come back to the table while > loading. I never presented a patch where the node got loaded with the table lock held. > Hmm. I think one of the reasons I had a separate lock (and a refcount > on the key structure) in my scheme rather than just setting the vnode > pointer to null is to avoid a race on reclaim or revoke: to reclaim > you want to lock the key, unlock the table, do reclaim, and only then > null the vnode pointer and unlock the key. (Then you relock the table, > look up the key again, and take it out if the vnode pointer is still > null.) If you only null the vnode pointer before running reclaim you > either have to hold the table lock during reclaim (almost certainly > not a good idea -- just yesterday in another context I was dealing > with a fs that needed to load a vnode when reclaiming another) or you > have a race where the vnode can be reloaded while reclaim is still in > progress. What is the difference between a locked cache node and a cache node with NULL vnode pointer (where the vnode pointer gets examined or changed with the table lock held)? >>> - Question 1: How much overhead would it really cause if we didn't >>> bother to cycle tmpfs vnodes "in" and "out" but just materialized them >>> along with the tmpfs_node and kept them around until someone destroys >>> the tmpfs_node? vnodes have a fair amount of slop in them, but they >>> aren't that big and if you're using tmpfs you presumably have enough >>> memory to keep whole files in RAM anyway. In this case we could just >>> provide a constructor and destructor for non-cache vnodes, decouple >>> that from the lifecycle logic (and firewall it off with plenty of >>> assertions and such) and a lot of the silly goes away. >> >> Once we reach a point where this decision has to be made it may be >> better to use the vnode cache for tmpfs than to make tmpfs the only >> fs being different. > > I don't think that's necessary. For vnodes belonging to virtual > objects, there's very little to be gained by counting them under > maxvnodes or cycling them "out of memory". The actual underlying > objects can't go away and the vnode itself is just a fairly small > administrative structure. Sounds reasonable -- but we are not at a point to make this decision. > (Also, with tmpfs files, given that the vnode is the uvm object > holding the file data, presumably there's some hack in place already > so it can't go away and thereby lose said file data? Or is the vnode a > uvm object that maps another uvm object that's the real file data? If > the latter, simplifying this would buy back the overhead of making > tmpfs vnodes permanently "resident".) > >>> - Question 2: The other obvious candidates for not using the vnode >>> cache are virtual constructs like kernfs and procfs. ISTM that >>> "loading" and "unloading" their vnodes is equally pointless and that >>> the same solution applies. The question is: are there any other fses >>> that wouldn't use the vnode cache for some other reason, such that >>> this reasoning wouldn't apply? I can't think of any, and it seems like >>> anything where there are genuine load and unload operations for vnodes >>> vnodes should be using the cache, but that doesn't prove anything. >> >> Kernfs and procfs already use a hash table and are candidates for >> the vnode cache. > > ...is there a reason for this or is it just because that's what vfs > demands/demanded, and/or copypaste from other fses? I can't think of > any reason why it would be meaningful to "unload" a kernfs vnode. The > closest approximation to a reason I can think of is that it might make > the device vnode for /kern/rootdev permanently resident; but that's > pretty much true anyway. > >>> Also, it should be vcache_get(), not intern; with all "intern" >>> operations I can think of, you pass in the object to be interned, >>> not just a key for it. If we had vcache_intern(), the model would be >>> that you consed up a vnode, then passed it (along with the key) to >>> vcache_intern(), which would then either insert it into the cache and >>> return it, or destroy it and return the existing equivalent one. This >>> is not really what we want. >> >> Agreed. And we may need vcache_lookup to lookup but not load a >> node too. > > When would one do that? I suppose there are probably reasons, but I > can't think of any offhand. It's going to inherently be a race, also, > since if it returns NULL or ENOENT or whatever it does if the vnode > isn't loaded, the vnode could get loaded by the time the caller sees > this. Consequently I suspect that most of the uses that someone might > think they want would actually be wrong. :-/ At least union will need something like this. The key of a union node may change and there must be a way to "test" a key. The file system will have to serialise to prevent races. But it may be better to not expose vcache_lookup() before we have a need for it. >>> and FWIW, I favor VOP_LOAD over VFS_LOADVNODE. >> >> I prefer a fs operation over a vnode operation to initialise a >> vnode. Using a vnode operation we had to either pass the vnode >> operations vector to vache_get() which is ugly or we had to set >> it from "*mp->mnt_op->vfs_opv_descs[0]->opv_desc_vector_p" before >> calling VOP_LOAD(). To me it looks better to have the fs setup the >> operations vector (which may not be the default one). > > Yes, you're right. > >> Could we finally agree on this interface: >> >> vcache_get(mp, key, key_len, vpp) to lookup and possibly load a vnode. >> vcache_lookup(mp, key, key_len, vpp) to lookup a vnode. >> vcache_remove(mp, key, key_len) to remove a vnode from the cache. >> VFS_LOAD_NODE(mp, vp, key, key_len, new_key) to initialise a vnode. > > I would prefer VFS_LOADVNODE rather than VFS_LOAD_NODE, for the > following not-quite-cosmetic reasons: > - it should be "vnode" rather than "node" as a matter of standard > usage; > - when all the op names have one underscore it's easier to process > tables of them with editor macros. I know the renamelock ones > broke this property, and maybe they should be changed... Ok. > What calls vcache_remove? Is this the external interface for revoke? > (Reclaim should be triggered from inside the cache, at least in the > long run.) We need a way for the fs to mark vnodes as not worth > caching any more (e.g. because they've been unlinked), so they get > reclaimed as soon as the refcount hits zero, but I don't think > "remove" gives the right semantics. The cache in its current form does not change the vnode life cycle. So vcache_remove() is called from the file systems reclaim operation. This will likely change once the vnode cache becomes responsible for the life cycle -- but one step after the other please. > If it's the interface for revoke, we should probably call it > vcache_revoke. If it's also the current path for getting reclaim > called, we should probably make a separate entry point for that (even > if it goes to the same logic) and hide it away later. > > If it's something else, please tell me what I've forgotten :-) > > -- > David A. Holland > dholl...@netbsd.org -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)