On Fri, May 25, 2018 at 10:01:15PM +0200, Jarom??r Dole??ek wrote: > 2018-05-21 21:49 GMT+02:00 Jarom??r Dole??ek <jaromir.dole...@gmail.com>: > > It turned out uvm_loan() incurs most of the overhead. I'm still on my > > way to figure what it is exactly which makes it so much slower than > > uiomove(). > > I've now pinned the problem down to the pmap_page_protect(..., > VM_PROT_READ), that code does page table manipulations and triggers > synchronous IPIs. So basically the same problem as the UBC code in > uvm_bio.c.
There's always going to be some critical size beneath which the cost of the MMU manipulations (or, these days, the interprocessor communication to cause other CPUs to do _their_ MMU manipulations) outweighs the benefit of avoiding copies. This problem's been known all the way as far back as Mach on the VAX, where they discovered that for typical message sizes to/from the microkernel, mapping instead of copying was definitely a lose. In this case, could we do better gathering the IPIs so the cost were amortized over many pages? -- Thor Lancelot Simon t...@panix.com "The two most common variations translate as follows: illegitimi non carborundum = the unlawful are not silicon carbide illegitimis non carborundum = the unlawful don't have silicon carbide."