Despite repeatedly messing around with the hash function over the last few months, and despite being ~64MB, bufhash often has long chains on my test system.
And, whatever is done to the hash function it will always throw away the valuable natural partitioning that's there to begin with, which is that buffers are cached with specific vnodes: # vmstat -H total used util num average maximum hash table buckets buckets % items chain chain bufhash 8388608 24222 0.29 36958 1.53 370 in_ifaddrhash 512 2 0.39 2 1.00 1 uihash 1024 4 0.39 4 1.00 1 vcache_hashmask 8388608 247582 2.95 252082 1.02 2 Changing this to use a per-vnode index makes use of that partitioning, and moves things closer to the point where bufcache_lock can be replaced by v_interlock in most places (I have not made that replacement yet - it's a decent chunk of work). Despite the best efforts of all involved the buffer cache code is a bit of a minefield because of things the users do. For example LFS and WAPBL want some combination of I/O buffers for COW type stuff, pre-allocated memory, and inclusion on the vnode buf lists presumably for vflushbuf(), but those buffers shouldn't be returned via incore(). To handle that case I added a new flag BC_IOBUF. I don't have performance measurements but from lockstat I see about 5-10x reduction in contention on bufcache_lock in my tests, which suggests to me that less time is being spent in incore(). Changes here: http://www.netbsd.org/~ad/2020/bufcache.diff Comments welcome. Thanks, Andrew