This is a diff against a tree containing the allocator patch I posted previously:
http://www.netbsd.org/~ad/2019/pdpol.diff The idea here is to buffer updates to the page's status (active, inactive, dequeued) and then sync those to the pdpolicy / pagedaemon queues regularly, a bit like the way the file system syncer works. Notes: - Since uvm_pageqlock was replaced with pg->interlock & a private lock for the pdpolicy code, pages can occasionally appear on a pdpolicy queue when they shouldn't be considered for pageout & reclaim (if the pagedaemon and object owner race), but it's not a problem because the pagedaemon can take pg->interlock and determine that a page is wired or in a state of flux or whatever, and so should be ignored because it'll be gone from the queues soon. - This patch takes it a little further. The pdpolicy code gets a dedicated TAILQ_ENTRY in struct vm_page so it doesn't need to share with the page allocator. A page can be PG_FREE and still on a pdpolicy queue (but not for long). We set an intended state for the page on pg->pqflags using atomics (active, inactive, dequeued) and then those pages are queued in a per-CPU buffer for their status updates to be purged and made real in the pdpol code's global state at some point in the near future. - The pagedaemon can also see those updates in real time by inspecting pg->pqflags and make real the page's status. So basically what I'm doing is batching the updates, trying to not let the global state fall too far behind, and always give the pagedaemon enough information to know the true picture for individual pages when it does its labourious scan of the queues, even if viewed globally the queues are a little bit behind. This seems to work really well, I think because a page can have multiple state transitions while it's in a queue waiting for its intended status change to be purged and made global. Shortly before composing this e-mail it occurred to me that FreeBSD may do something similar but to be honest I didn't dig into their code. I need to tweak this to allocate a smaller buffer for uniprocessor systems and maybe consider using prefetching instructions when purging, and want to re-run the tests because I changed a couple of things but I'm basically happy with it. Results on my kernel build test: 72.66 real 1653.86 user 593.19 sys new allocator 71.26 real 1671.13 user 502.94 sys new allocator + pdpol.diff Lock contention before and after: Total% Count Time/ms Lock Caller ------ ------- --------- ---------------------- ------------------------------ 28.86 44056935 77553.77 pdpol_state <all> 15.62 22177251 41978.93 pdpol_state uvmpdpol_pageactivate+36 13.12 21656129 35251.99 pdpol_state uvmpdpol_pagedequeue+18 0.12 223482 322.77 pdpol_state uvmpdpol_pagedeactivate+18 0.00 73 0.07 pdpol_state uvmpdpol_pageenqueue+18 Total% Count Time/ms Lock Caller ------ ------- --------- ---------------------- ------------------------------ 0.23 11301 362.35 pdpol_state uvmpdpol_pageintent_set+b9 Andrew