Here is a patch [1] that removes pmap_kernel from userland. The idea may be a bit difficult to grasp, so I'll just draw the big picture.
Unmapping pmap_kernel from userland is a complicated business for several reasons. The two main reasons are: (1) The kernel stack needs to be mapped in userland. Yet, it contains secret data that should not be made available to userland. (2) In order to switch to the kernel page tables (during a user->kernel transition), we need to have a TLS. But the TLS contains secret data that too should not be made available to userland. These two issues are solved as follows: (1) A fake one-page-sized stack is added in pcpu_entry. The VA of this stack is dynamically kentered into the last physical page of the LWP's kernel stack. The stacks are changed in such a way that their last page can only contain a trapframe structure. During a user->kernel transition, a trapframe is pushed on the fake stack; then we switch %rsp to the real stack and continue execution as usual. Here we don't need to copy the content of the fake stack into the new stack, since the two VAs point to the same physical page. See this drawing [2], kindergarten style. With this design the part of the kernel stack that contains secrets is actually *unmapped* from userland. (2) A User Thread Local Storage (UTLS) page is added in pcpu_area. Each CPU puts there the address of the kernel pdir. A particular rsp0 is set there too, because the syscall entry point is special and needs a different mechanism. In this implementation everything is optimized to reduce the overhead. In the end SVS_ENTER becomes: movq SVS_UTLS+UTLS_KPDIRPA,%rax movq %rax,%cr3 movq CPUVAR(KRSP0),%rsp Which is pretty fast to execute compared to the total separation it provides. The place where we kenter the fake stack into the real one is svs_lwp_switch, and it is organized in such a way that we don't even need to flush the VA from the TLB. The only drawback is that we need to add a bunch of redundant values in cpu_info; but again these values are computed and saved at boot time, so that they don't need to be recomputed in each context switch or kernel<->user transition. After this change only the kernel image will need to be unmapped, and this can be solved quickly. Note: our handling of double faults (and NMIs a bit) has always been wrong, and is even more wrong with SVS; this is in my todo list. I will probably commit this patch soon. Of course, it is compatible with KASLR. Maxime [1] http://m00nbsd.net/garbage/svs/stack.diff [2] http://m00nbsd.net/garbage/svs/stack.png