I wrote a small blogpost[1] about this series, and was told[2] that it would be interesting to share it on this thread, so here it is, copied verbatim:
Ruiqi Gong and Xiu Jianfeng got their [Randomized slab caches for kmalloc()](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6152940584290668b35fa0800026f6a1ae05fe) patch series merged upstream, and I've and enough discussions about it to warrant summarising them into a small blogpost. The main idea is to have multiple slab caches, and pick one at random based on the address of code calling `kmalloc()` and a per-boot seed, to make heap-spraying harder. It's a great idea, but comes with some shortcomings for now: - Objects being allocated via wrappers around `kmalloc()`, like `sock_kmalloc`, `f2fs_kmalloc`, `aligned_kmalloc`, … will end up in the same slab cache. - The slabs needs to be pinned, otherwise an attacker could [feng-shui](https://en.wikipedia.org/wiki/Heap_feng_shui) their way into having the whole slab free'ed, garbage-collected, and have a slab for another type allocated at the same VA. [Jann Horn](https://thejh.net/) and [Matteo Rizzo](https://infosec.exchange/@nspace) have a [nice set of patches](https://github.com/torvalds/linux/compare/master...thejh:linux:slub-virtual-upstream), discussed a bit in [this Project Zero blogpost](https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html), for a feature called [`SLAB_VIRTUAL`]( https://github.com/torvalds/linux/commit/f3afd3a2152353be355b90f5fd4367adbf6a955e), implementing precisely this. - There are 16 slabs by default, so one chance out of 16 to end up in the same slab cache as the target. - There are no guard pages between caches, so inter-caches overflows are possible. - As pointed by [andreyknvl](https://twitter.com/andreyknvl/status/1700267669336080678) and [minipli](https://infosec.exchange/@minipli/111045336853055793), the fewer allocations hitting a given cache means less noise, so it might even help with some heap feng-shui. - minipli also pointed that "randomized caches still freely mix kernel allocations with user controlled ones (`xattr`, `keyctl`, `msg_msg`, …). So even though merging is disabled for these caches, i.e. no direct overlap with `cred_jar` etc., other object types can still be targeted (`struct pipe_buffer`, BPF maps, its verifier state objects,…). It’s just a matter of probing which allocation index the targeted object falls into.", but I considered this out of scope, since it's much more involved; albeit something like [`CONFIG_KMALLOC_SPLIT_VARSIZE`](https://github.com/thejh/linux/blob/slub-virtual/MITIGATION_README) wouldn't significantly increase complexity. Also, while code addresses as a source of entropy has historically be a great way to provide [KASLR](https://lwn.net/Articles/569635/) bypasses, `hash_64(caller ^ random_kmalloc_seed, ilog2(RANDOM_KMALLOC_CACHES_NR + 1))` shouldn't trivially leak offsets. The segregation technique is a bit like a weaker version of grsecurity's [AUTOSLAB](https://grsecurity.net/how_autoslab_changes_the_memory_unsafety_game), or a weaker kernel-land version of [PartitionAlloc](https://chromium.googlesource.com/chromium/src/+/master/base/allocator/partition_allocator/PartitionAlloc.md), but to be fair, making use-after-free exploitation harder, and significantly harder once pinning lands, with only ~150 lines of code and negligible performance impact is amazing and should be praised. Moreover, I wouldn't be surprised if this was backported in [Google's KernelCTF](https://google.github.io/security-research/kernelctf/rules.html) soon, so we should see if my analysis is correct. 1. https://dustri.org/b/some-notes-on-randomized-slab-caches-for-kmalloc.html 2. https://infosec.exchange/@vba...@social.kernel.org/111046740392510260 -- Julien (jvoisin) Voisin GPG: 04D041E8171901CC dustri.org