Starting 10-15 years ago, the NetBSD kernel has been slowly migrating from the traditional BSD malloc(9) API to the Solaris-inspired kmem(9) API.
The main differences between the interfaces are: malloc(9) kmem(9) --------- ------- . attribution by malloc tags . attribution only by size -- can try which match in alloc and free to use return address but it's not matched between alloc and free . size is stored, not passed . size is stored only for diagnostics, to free must be passed to free . allows zero-size allocs . forbids zero-size allocs (even though they are allowed in Solaris) I'm not too concerned about stored- vs passed-size, and zero-size allocations don't seem like a big deal either way (although it strikes me as silly to have adopted a Solaris API, except incompatibly, like we did with condvar(9)). But attribution is a different story. The attribution by malloc tags used to make it clear which subsystem was responsible for memory usage, which was helpful for chasing down leaks. With kmem(9), such attribution requires heuristic search based on the size and return address. I added some dtrace probes recently to help monitor allocations by requested size, but it still takes much more work to attribute leaks to code responsible for them. It took a lot of effort, for instance, to even recognize that major leaks from radeon and nouveau came from fence allocations, because we had to: 1. start from which _sizes_ of allocations appeared to be leaking, then 2. monitor stack traces with dtrace to find where many of those allocations were happening, and then 3. guess which ones were _leaks_ because we can't match up the alloc and free except by size. I tried to search for discussion about this but haven't found anything substantive, just commit logs recording the transition happening. So I wonder: - Was the rationale migrating to kmem(9) written down or discussed publicly anywhere? - What's the benefit of using kmem(9) over malloc(9)? - Is it even worthwhile to complete this transition? - What would the cost of restoring attribution be, other than the obvious O(ntag*nsizebuckets) memory cost to record it and the effort to annotate allocations? Note: I'm not addressing the implementation here. Right now they are backed by the same array of pool caches indexed by mostly power-of-two granularity sizes. I'm only asking about the interface used across the kernel and drivers.