hi, sorry for being so late to the party on this thread, I should have replied long ago.
On Fri, May 11, 2018 at 05:28:18PM +0200, Jaromr Dole?ek wrote: > I've modified the uvm_bio.c code to use direct map for amd64. Change > seems to be stable, I've also run 'build.sh tools' to test out the > system, build suceeded and I didn't observe any problems with > stability. Test was on 6-core Intel CPU with HT on (i.e. "12" cores > visible to OS), NVMe disk, cca 2GB file read done via: > > dd if=foo of=/dev/null bs=64k msgfmt=human > > Because of the proper formatting, the speed is now in proper Gibibytes. > > 1. non-cached read: old: 1.7 GiB/s, new: 1.9 GiB/s > 2. cache read: old: 2.2 GiB/s, new: 5.6 GiB/s > > Seems the 1.9 GiB/s is device limit. > > The patch for this is at: > http://www.netbsd.org/~jdolecek/uvm_bio_direct.diff the problem with keeping the pages locked (ie. PG_BUSY) while accessing the user address space is that it can lead to deadlock if the page we are accessing via the kernel mapping is the same page that we are accessing via the user mapping. if the user space access faults (eg. because the page has not been accessed via that mapping yet) then the fault handler will try to lock the same page again. even if it's not the same page, it is generally not safe for a thread to wait to lock a page while that thread has locked another page already. however if you don't lock the object page while accessing user space, then the object page can change identity out from under you, corrupting whatever the page is being used for in its new identity. my plan for this in the longer term was to add a mechanism in UVM for preventing a page from changing identity, which would be somewhat similar to the existing loan mechanism except that the page could still be written to without making a copy of it. linux and freebsd (and solaris) each have something like this, though they are all a bit different in the details. using this new mechanism, we could keep the page being accessed via the direct map from changing identity while still allowing normal page faults to happen on the user mapping. there are probably less intrusive ways to attack the problem at hand, but the above approach seemed like the best way to go in the long run. I was hoping the emap thing would work for the shorter term, but if that's not practical to make work then we can look for something else. > During testing I noticed that the read-ahead code slows down the > cached case quite a lot: > no read-ahead: > 1. non-cached read: old: 851 MiB/s, new: 1.1 GiB/s > 2. cached read: old: 2.3 GiB/s, new: 6.8 GiB/s > > I've implemented a tweak to read-ahead code to skip the full > read-ahead if last page of the range is already in cache, this > improved things a lot: > smarter read-ahead: > 1. non-cached read: old: 1.7 GiB/s, new: 1.9 GiB/s (no change compared > to old read-ahead) > 2. cached read: old: 2.2 GiB/s, new: 6.6 GiB/s > > Patch for this is at: > http://www.netbsd.org/~jdolecek/uvm_readahead_skip.diff the concept is good, but would using PGO_NOWAIT work about as well as adding a new PGO_CACHEONLY flag? if any pages are not in memory then the pgo_get call would fail since that would require waiting for them to be read. the suggestion from others to just use uvm_pagelookup() would be fine too. it would also be nice to put this new logic for checking if an object page is in memory into a separate function in case it's useful in other contexts (and to isolate the implementation a bit). > For the direct map, I've implemented new helper pmap function, which > in turn calls a callback with the appropriate virtual address. This > way the arch pmap is more free to choose an alternative > implementations for this, e.g. without actually having a direct map, > like sparc64. > > The code right now has some instrumentation for testing, but the > general idea should be visible. > > Opinions? Particularly, I'm not quite sure if it safe to avoid > triggering a write fault in ubc_uiomove(), and whether the new > heuristics for read-ahead is good enough for general case. is there a specific case about triggering a write fault in ubc_uiomove() that you are concerned about? the general rule is that a page being modified must be marked dirty either as part of the modification (if it is being written via a regular ref/mod-tracking pmap entry) or after the modifications are complete (if it is being written via a non ref/mod-tracking mechanism such as the direct map). the read-ahead logic in post-UBC netbsd has never been very sophisticated, I'd be impressed if you managed to make it worse. :-) -Chuck