On Mon, Nov 01, 2010 at 02:32:23AM +0900, Masao Uebayashi wrote: > Hmmmmm. I may be missing something seriously. I think I've > understood the following description you wrote, and I got hit this > in the very early development stage ... >1 year ago. It was > uvm_fault() -> uvmfault_promote() -> amap_add() -> pmap_page_protect() > Since then I've assumed that shared amap is a pretty much common > thing... Now I realize it should not be, as you describe. Worse, > I can't reproduce that code path... > > Now I have to *really* understand how this works...
I still have no idea how I was hit by this. Strange. Considering how AMAP_SHARED is used, especially it's backed by vnode, this is only for highly tuned server/client programs which share initialized data (.data). While such a use case may have some value, I'd say this is a rare feature. I append a test program. > > (I'll respond about this topic again later.) > > Masao > > On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote: > (snip) > > now here's the explanation I promised for how to treat XIP pages > > as unmanaged instead of managed. > > > > first, some background for other people who don't know all this: > > the only reason that treating XIP pages as managed pages is > > relevant at all is because of the AMAP_SHARED flag in UVM, > > which allows anonymous memory to be shared between processes > > such that the changes made by one process are seen by the other. > > this impacts XIP pages (which are not anonymous) because a > > PROT_WRITE, MAP_PRIVATE mapping of an XIP vnode should point to > > the XIP pages as long as all access to the mapping is for reads, > > but when the mapping is written to then the XIP page should be > > copied to an anonymous page (the normal COW operation) but that > > new anonymous page should still be shared between all processes > > that are sharing the AMAP_SHARED mapping. to force those other > > processes to take another page fault the next time they access > > their copy of the mapping (which we need to do so that they will > > start accessing the new anonymous page instead of the XIP page), > > we must invalidate all the other pmap entries for the XIP page, > > which we do by calling pmap_page_protect() on it. the pmap layer > > tracks all the mappings of the page and thus it can find them all. > > > > there are several ways that the AMAP_SHARED functionality is used, > > and unfortunately they would need to changed in different ways to > > make this work for XIP pages without needing to track mappings: > > > > - uvm_io(), which copies data between the kernel or current process > > and an arbitrary other process address space. > > currently this works by sharing the other address space with > > the kernel via uvm_map_extract() and then just using uiomove() > > to transfer the data. this could be done instead by using part > > of the uvm_fault() code to find the physical page in the other > > address space that we want to access and lock it (ie. set PG_BUSY), > > map the page into the kernel (perhaps using uvm_pager_mapin()), > > transfer the data, then unmap the page and unlock it. > > > > - uvm_mremap(), which resizes an existing mapping. > > this uses uvm_map_extract() internally, which uses AMAP_SHARED, > > but the mremap operation doesn't actually need the semantics of > > AMAP_SHARED since as mremap doesn't create any additional mappings > > as far as applications are concerned. the usage of AMAP_SHARED > > is just a side-effect of the current implementation, which bends > > over backward to call a bunch of existing higher-level functions > > rather than doing something more direct (which would be simpler > > and more efficient as well). > > > > - MAP_INHERIT_SHARE, used to implement minherit(). > > this is the one that is the most trouble, since it's what AMAP_SHARED > > was invented for. however, it's also of least importance since > > some searching with google finds absolutely no evidence of > > any application actually using it, just lots of copies of > > the implementations and manpages for various operating systems. > > > > with that in mind, there are several ways this could be handled. > > (1) just drop support for minherit() entirely. > > (2) reject attempts to set MAP_INHERIT_SHARE on XIP mappings. > > (3) copy XIP pages into normal anon pages when setting > > MAP_INHERIT_SHARE. > > (4) copy XIP pages into normal vnode pages when setting > > MAP_INHERIT_SHARE. this would mean that the getpages > > code would need to look in the normal page cache > > before using XIP pages. I think this option would also > > need getpages to know about the inherit flag to > > correctly handle later faults on XIP mappings, > > and there are probably other sublte complications. > > > > of these choices, (2) sounds like the best compromise to me. > > > > > > this approach would also bring back some issues where our previous > > discussion went around in circles, such as callers of VOP_GETPAGES() > > wanting vm_page pointers but XIP pages not having them, but > > I'll leave that additional discussion for future email if necessary. So you're OK to address this AMAP_SHARED mess after the merge. I'll come back these interesting topics later. Masao > > > > I'm just going over all this now so that it's clear to everyone > > that this kind of approach is possible if the memory overhead of > > the full vm_page structures for XIP pages is deemed too high. > > > > > > -Chuck > > -- > Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635 -- Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
/* $Id$ */ #include <err.h> #include <stdio.h> #include <unistd.h> #include <sys/cdefs.h> #include <sys/mman.h> #define SIZE 4096 /* PAGE_SIZE */ char buf_a[SIZE] __aligned(SIZE) = { [0] = 1 }; char buf_b[SIZE] __aligned(SIZE) = { [0] = 1 }; int main(int ac, char *av[]) { int error = 0; pid_t pid; int status; error = minherit(buf_b, SIZE, MAP_INHERIT_SHARE); if (error != 0) err(1, "minherit(MAP_INHERIT_SHARE)"); if ((pid = fork()) < 0) err(1, "fork"); switch (pid) { case 0: sleep(1); printf("buf_a[0] = %d\n", buf_a[0]); printf("buf_b[0] = %d\n", buf_b[0]); break; default: buf_a[0] = -1; buf_b[0] = -1; (void)wait(&status); break; } return 0; }