Re: Vnode API change: add global vnode cache

J. Hannken-Illjes Thu, 01 May 2014 09:50:46 -0700

On 01 May 2014, at 18:06, David Holland <dholland-t...@netbsd.org> wrote:


> On Mon, Apr 28, 2014 at 10:56:44AM +0200, J. Hannken-Illjes wrote:
>>> Wading into this after the fact, I see the following points:
>>> 
>>> - Duplicating the getnewvnode logic is definitely bad; we have enough
>>> cruft without adding new redundant but not quite equivalent code
>>> paths.
>>> 
>>> - It seems to me that in the long run, all of this baloney should be
>>> hidden away inside the vfs layer; filesystems that use the vnode cache
>>> should only need to call vcache_get, and the only thing that should
>>> ever see a partially initialized vnode is VOP_LOAD and nothing outside
>>> the vnode cache should ever see a vnode with refcount zero.
>> 
>> Agreed.
> 
> ok, in that case I think the only issue with the patch I looked at
> Sunday (was it that long ago already?) is that it does duplicate the
> getnewvnode logic -- if you've since fixed that I don't think I have
> further substantive objections.

What do you mean with "duplicate the getnewvnode logic" here?
Vfs_busy()/vfs_unbusy() and vfs_insmntque()?

>>> - A while back (one or two rounds ago) I floated an idea that had
>>> separate vnode key objects for the same reason they appear here: to
>>> make insertion atomic. Riastradh convinced me at the time that this
>>> wasn't necessary, so I dropped it and never got around to posting it
>>> on tech-kern. However, it had a characteristic that I don't see here:
>>> locks in the vnode keys. The idea of that was to introduce a level of
>>> indirection so that nothing like VI_CHANGING was needed: while loading
>>> a vnode, you unlock the table but leave the key locked, and that takes
>>> care, in an orderly way, of making sure nobody else sees it. If we're
>>> going to end up with vnode key objects, I think they should contain
>>> locks in favor of having exposed or semi-exposed state flags.
>> 
>> While I agree on atomic insertions being a mandatory property I
>> don't see the need for a per key lock.  As collisions are very rare
>> wait and retry using hash table lock and key->vnode being NULL
>> or a per table cv like I did before is sufficient (at least for now).
> 
> Having a lock and waiting on a cv and state bit are exactly
> equivalent, except that if you have a lock,
>  - neither the state nor the mechanism is exposed.

What state is exposed by "wait and retry"?

> I would consider that an advantage, but it's not a killer.
> 
> I don't think holding the table lock (or even a lock on part of the
> table, like one hash chain) while loading a vnode is a good idea,
> because a layer vnode is likely to come back to the table while
> loading.

I never presented a patch where the node got loaded with the table lock
held.

> Hmm. I think one of the reasons I had a separate lock (and a refcount
> on the key structure) in my scheme rather than just setting the vnode
> pointer to null is to avoid a race on reclaim or revoke: to reclaim
> you want to lock the key, unlock the table, do reclaim, and only then
> null the vnode pointer and unlock the key. (Then you relock the table,
> look up the key again, and take it out if the vnode pointer is still
> null.) If you only null the vnode pointer before running reclaim you
> either have to hold the table lock during reclaim (almost certainly
> not a good idea -- just yesterday in another context I was dealing
> with a fs that needed to load a vnode when reclaiming another) or you
> have a race where the vnode can be reloaded while reclaim is still in
> progress.

What is the difference between a locked cache node and a cache node
with NULL vnode pointer (where the vnode pointer gets examined or
changed with the table lock held)?

>>> - Question 1: How much overhead would it really cause if we didn't
>>> bother to cycle tmpfs vnodes "in" and "out" but just materialized them
>>> along with the tmpfs_node and kept them around until someone destroys
>>> the tmpfs_node? vnodes have a fair amount of slop in them, but they
>>> aren't that big and if you're using tmpfs you presumably have enough
>>> memory to keep whole files in RAM anyway. In this case we could just
>>> provide a constructor and destructor for non-cache vnodes, decouple
>>> that from the lifecycle logic (and firewall it off with plenty of
>>> assertions and such) and a lot of the silly goes away.
>> 
>> Once we reach a point where this decision has to be made it may be
>> better to use the vnode cache for tmpfs than to make tmpfs the only
>> fs being different.
> 
> I don't think that's necessary. For vnodes belonging to virtual
> objects, there's very little to be gained by counting them under
> maxvnodes or cycling them "out of memory". The actual underlying
> objects can't go away and the vnode itself is just a fairly small
> administrative structure.

Sounds reasonable -- but we are not at a point to make this decision.

> (Also, with tmpfs files, given that the vnode is the uvm object
> holding the file data, presumably there's some hack in place already
> so it can't go away and thereby lose said file data? Or is the vnode a
> uvm object that maps another uvm object that's the real file data? If
> the latter, simplifying this would buy back the overhead of making
> tmpfs vnodes permanently "resident".)
> 
>>> - Question 2: The other obvious candidates for not using the vnode
>>> cache are virtual constructs like kernfs and procfs. ISTM that
>>> "loading" and "unloading" their vnodes is equally pointless and that
>>> the same solution applies. The question is: are there any other fses
>>> that wouldn't use the vnode cache for some other reason, such that
>>> this reasoning wouldn't apply? I can't think of any, and it seems like
>>> anything where there are genuine load and unload operations for vnodes
>>> vnodes should be using the cache, but that doesn't prove anything.
>> 
>> Kernfs and procfs already use a hash table and are candidates for
>> the vnode cache.
> 
> ...is there a reason for this or is it just because that's what vfs
> demands/demanded, and/or copypaste from other fses? I can't think of
> any reason why it would be meaningful to "unload" a kernfs vnode. The
> closest approximation to a reason I can think of is that it might make
> the device vnode for /kern/rootdev permanently resident; but that's
> pretty much true anyway.
> 
>>> Also, it should be vcache_get(), not intern; with all "intern"
>>> operations I can think of, you pass in the object to be interned,
>>> not just a key for it. If we had vcache_intern(), the model would be
>>> that you consed up a vnode, then passed it (along with the key) to
>>> vcache_intern(), which would then either insert it into the cache and
>>> return it, or destroy it and return the existing equivalent one. This
>>> is not really what we want.
>> 
>> Agreed.  And we may need vcache_lookup to lookup but not load a
>> node too.
> 
> When would one do that? I suppose there are probably reasons, but I
> can't think of any offhand. It's going to inherently be a race, also,
> since if it returns NULL or ENOENT or whatever it does if the vnode
> isn't loaded, the vnode could get loaded by the time the caller sees
> this. Consequently I suspect that most of the uses that someone might
> think they want would actually be wrong. :-/

At least union will need something like this.  The key of a union
node may change and there must be a way to "test" a key.  The file
system will have to serialise to prevent races.

But it may be better to not expose vcache_lookup() before we have
a need for it.

>>> and FWIW, I favor VOP_LOAD over VFS_LOADVNODE.
>> 
>> I prefer a fs operation over a vnode operation to initialise a
>> vnode.  Using a vnode operation we had to either pass the vnode
>> operations vector to vache_get() which is ugly or we had to set
>> it from "*mp->mnt_op->vfs_opv_descs[0]->opv_desc_vector_p" before
>> calling VOP_LOAD().  To me it looks better to have the fs setup the
>> operations vector (which may not be the default one).
> 
> Yes, you're right.
> 
>> Could we finally agree on this interface:
>> 
>> vcache_get(mp, key, key_len, vpp) to lookup and possibly load a vnode.
>> vcache_lookup(mp, key, key_len, vpp) to lookup a vnode.
>> vcache_remove(mp, key, key_len) to remove a vnode from the cache.
>> VFS_LOAD_NODE(mp, vp, key, key_len, new_key) to initialise a vnode.
> 
> I would prefer VFS_LOADVNODE rather than VFS_LOAD_NODE, for the
> following not-quite-cosmetic reasons:
>  - it should be "vnode" rather than "node" as a matter of standard
>    usage;
>  - when all the op names have one underscore it's easier to process
>    tables of them with editor macros. I know the renamelock ones
>    broke this property, and maybe they should be changed...

Ok.

> What calls vcache_remove? Is this the external interface for revoke?
> (Reclaim should be triggered from inside the cache, at least in the
> long run.) We need a way for the fs to mark vnodes as not worth
> caching any more (e.g. because they've been unlinked), so they get
> reclaimed as soon as the refcount hits zero, but I don't think
> "remove" gives the right semantics.

The cache in its current form does not change the vnode life cycle.
So vcache_remove() is called from the file systems reclaim operation.

This will likely change once the vnode cache becomes responsible for
the life cycle -- but one step after the other please.

> If it's the interface for revoke, we should probably call it
> vcache_revoke. If it's also the current path for getting reclaim
> called, we should probably make a separate entry point for that (even
> if it goes to the same logic) and hide it away later.
> 
> If it's something else, please tell me what I've forgotten :-)
> 
> -- 
> David A. Holland
> dholl...@netbsd.org

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)

Re: Vnode API change: add global vnode cache

Reply via email to