Hi, I have some straightforward changes for uvm to improve the situation on many-CPU machines. I'm going to break them into pieces to make them easier to review (only this first piece and what's already in CVS is ready).
I have carefuly measured the impact of these over hundreds of kernel builds, using lockstat, tprof and some custom instrumentation so I'm confident that for each, the effects at least are of value. Anyway I'd be grateful if someone could take a look. This one is about reducing pressure on uvm_pageqlock, and cache misses on struct vm_page. Cheers, Andrew http://www.netbsd.org/~ad/2019/uvm1.diff vm_page: cluster largely static fields used during page lookup in the first 64-bytes. Increase wire_count to 32-bits, and add a field for use of the page replacement policy. This leaves 2 bytes spare in a word locked by uvm_fpageqlock/uvm_pageqlock which I want to use later for changes to the page allocator. It also brings vm_page up to 128 bytes on amd64. New functions: => uvmpdpol_pageactivate_p() For page replacement policy. Returns true if pdpol thinks activation info would be useful enough to cause disruption to page queues, vm_page and uvm_fpageqlock. For CLOCK this returns true if page is not active, or was not activated within the last second. => uvm_pageenqueue1() Call without uvm_pageqlock. Acquires the lock and enqueues the page only if not already enqueued. => uvm_pageactivate1() Call without uvm_pageqlock. Acquires the lock and activates the page only if uvmpdpol_pageactivate_p() says yes. No similar change for deactivate nor dequeue as they are much more definite. => uvm_pagefree1() First part of uvm_pagefree() - strip page of identity. Requires uvm_pageqlock if associated with an object. => uvm_pagefree2() Second part of uvm_pagefree(). Send page to free list. No locks required.