2018-05-21 21:49 GMT+02:00 Jaromír Doleček <jaromir.dole...@gmail.com>:
> It turned out uvm_loan() incurs most of the overhead. I'm still on my
> way to figure what it is exactly which makes it so much slower than
> uiomove().

I've now pinned the problem down to the pmap_page_protect(...,
VM_PROT_READ), that code does page table manipulations and triggers
synchronous IPIs. So basically the same problem as the UBC code in
uvm_bio.c.

If I comment out the pmap_page_protect() in uvm_loan.c and hence do
not change vm_page attributes, the uvm_loan() + direct map pipe
variant manages about 13 GB/s, compared to about 12 GB/s for the
regular pipe. 8% speedup is not much, but as extra it removes all the
KVA limits. Since it should scale well, it should be possible to
reduce reduce the direct threshold, and reduce the size of the fixed
in-kernel pipe buffer to save kernel memory.

So, I'm actually thinking to change uvm_loan() to not enforce R/O
mappings and leave page attributes without change. It would require
the caller to deal with possible COW or PG_RDONLY if they need to do
writes. In other words, allow using the 'loan' mechanics also for
writes, and eventually use this also for UBC writes to replace the
global PG_BUSY lock there.

WYT?

Jaromir

Reply via email to