2018-05-21 21:49 GMT+02:00 Jaromír Doleček <jaromir.dole...@gmail.com>: > It turned out uvm_loan() incurs most of the overhead. I'm still on my > way to figure what it is exactly which makes it so much slower than > uiomove().
I've now pinned the problem down to the pmap_page_protect(..., VM_PROT_READ), that code does page table manipulations and triggers synchronous IPIs. So basically the same problem as the UBC code in uvm_bio.c. If I comment out the pmap_page_protect() in uvm_loan.c and hence do not change vm_page attributes, the uvm_loan() + direct map pipe variant manages about 13 GB/s, compared to about 12 GB/s for the regular pipe. 8% speedup is not much, but as extra it removes all the KVA limits. Since it should scale well, it should be possible to reduce reduce the direct threshold, and reduce the size of the fixed in-kernel pipe buffer to save kernel memory. So, I'm actually thinking to change uvm_loan() to not enforce R/O mappings and leave page attributes without change. It would require the caller to deal with possible COW or PG_RDONLY if they need to do writes. In other words, allow using the 'loan' mechanics also for writes, and eventually use this also for UBC writes to replace the global PG_BUSY lock there. WYT? Jaromir