An idea for a feature similar to KMEM_GUARD - which I recently removed because it was too weak and useless -, but this time at the pool layer, covering certain specific pools, without memory consumption or performance cost, and enabled by default at least on amd64. Note that this is hardening and exploit mitigation, but not bug detection, so it will be of little interest in the context of fuzzing. Note also that it targets 64bit arches, because they have nearly unlimited VA.
The idea is that we can use special guard allocators on certain pools to prevent important kernel data from being close to untrusted data in the VA space. Suppose the kernel is parsing a received packet from the network, and there is a buffer overflow which causes it to write beyond the mbuf. The data is in an mbuf cluster of size 2K (on amd64). This mbuf cluster sits on a 4K page allocated using the default pool allocator. After the 4K page in memory, there could be critical kernel data sitting, which an attacker could overwrite. overflow ---------------------------> +------------+------------+----------------------+ | 2K Cluster | 2K Cluster | Critical Kernel Data | +------------+------------+----------------------+ <- usual 4K pool page --> <- another 4K page --> This is a scenario that I already encountered when working on NetBSD's network stack. Now, we switch the mcl pool to use the new uvm_km_guard API (simple wrappers to allocate buffers with unmapped pages at the beginning and the end). The pool layer sees pages of size 128K, and packs 64 2K clusters in them. overflow ------------~~~~~> +------------+-------+-------+-------+-------+-------+------------+ | Unmapped | 2K C. | 2K C. | [...] | 2K C. | 2K C. | Unmapped | +------------+-------+-------+-------+-------+-------+------------+ <-- 64K ---> <-- 128K pool page with 64 clusters --> <-- 64K ---> The pool page header is off-page, and bitmapped. Therefore, there is strictly no kernel data in the 128K pool page. The overflow still occurs, but this time the critical kernel data is far from here, after the unmapped pages at the end. At worst only other clusters get overwritten; at best we are close to the end and hit a page fault which stops the overflow. 64K is chosen as the maximum of uint16_t. No performance cost, because these guarded buffers are allocated only when the pools grow, which is a rare operation that occurs almost only at boot time. No actual memory consumption either, because unmapped areas don't consume physical memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on amd64, far beyond what we will ever need -, so no problem with consuming VA. The code is here [1] for mcl, it is simple and works fine. It is not perfect but can already prevent a lot of trouble. The principle could be applied to other pools. [1] https://m00nbsd.net/garbage/pool/guard.diff