RB tree in the buffer cache

Andrew Doran Sat, 04 Apr 2020 15:53:28 -0700

Despite repeatedly messing around with the hash function over the last few
months, and despite being ~64MB, bufhash often has long chains on my test
system.


And, whatever is done to the hash function it will always throw away the
valuable natural partitioning that's there to begin with, which is that
buffers are cached with specific vnodes:

# vmstat -H
                    total     used     util      num  average  maximum
hash table        buckets  buckets        %    items    chain    chain
bufhash           8388608    24222     0.29    36958     1.53      370
in_ifaddrhash         512        2     0.39        2     1.00        1
uihash               1024        4     0.39        4     1.00        1
vcache_hashmask   8388608   247582     2.95   252082     1.02        2

Changing this to use a per-vnode index makes use of that partitioning, and
moves things closer to the point where bufcache_lock can be replaced by
v_interlock in most places (I have not made that replacement yet - it's a
decent chunk of work).

Despite the best efforts of all involved the buffer cache code is a bit of a
minefield because of things the users do.  For example LFS and WAPBL want
some combination of I/O buffers for COW type stuff, pre-allocated memory,
and inclusion on the vnode buf lists presumably for vflushbuf(), but those
buffers shouldn't be returned via incore().  To handle that case I added a
new flag BC_IOBUF.

I don't have performance measurements but from lockstat I see about 5-10x
reduction in contention on bufcache_lock in my tests, which suggests to me
that less time is being spent in incore().

Changes here:

        http://www.netbsd.org/~ad/2020/bufcache.diff

Comments welcome.

Thanks,
Andrew

RB tree in the buffer cache

Reply via email to