On Mon, Feb 01, 2010 at 11:30:06PM -0500, Ted Unangst wrote: > I think this fixes the problem with sleeping and holding pseg_lck. > > Index: uvm_extern.h > =================================================================== > RCS file: /home/tedu/cvs/src/sys/uvm/uvm_extern.h,v > retrieving revision 1.82 > diff -u -r1.82 uvm_extern.h > --- uvm_extern.h 11 Aug 2009 18:43:33 -0000 1.82 > +++ uvm_extern.h 2 Feb 2010 04:28:09 -0000 > @@ -517,8 +517,9 @@ > vaddr_t *, vsize_t, int, > boolean_t, vm_map_t); > vaddr_t uvm_km_valloc(vm_map_t, vsize_t); > -vaddr_t uvm_km_valloc_align(vm_map_t, vsize_t, vsize_t); > +vaddr_t uvm_km_valloc_try(vm_map_t, vsize_t); > vaddr_t uvm_km_valloc_wait(vm_map_t, vsize_t); > +vaddr_t uvm_km_valloc_align(struct vm_map *, vsize_t, > vsize_t, int); > vaddr_t uvm_km_valloc_prefer_wait(vm_map_t, vsize_t, > voff_t); > void *uvm_km_getpage(boolean_t, int *); > Index: uvm_km.c > =================================================================== > RCS file: /home/tedu/cvs/src/sys/uvm/uvm_km.c,v > retrieving revision 1.75 > diff -u -r1.75 uvm_km.c > --- uvm_km.c 25 Jul 2009 12:55:40 -0000 1.75 > +++ uvm_km.c 2 Feb 2010 04:27:35 -0000 > @@ -571,11 +571,17 @@ > vaddr_t > uvm_km_valloc(struct vm_map *map, vsize_t size) > { > - return(uvm_km_valloc_align(map, size, 0)); > + return(uvm_km_valloc_align(map, size, 0, 0)); > } > > vaddr_t > -uvm_km_valloc_align(struct vm_map *map, vsize_t size, vsize_t align) > +uvm_km_valloc_try(struct vm_map *map, vsize_t size) > +{ > + return(uvm_km_valloc_align(map, size, 0, UVM_FLAG_TRYLOCK)); > +} > + > +vaddr_t > +uvm_km_valloc_align(struct vm_map *map, vsize_t size, vsize_t align, int > flags) > { > vaddr_t kva; > UVMHIST_FUNC("uvm_km_valloc"); UVMHIST_CALLED(maphist); > @@ -592,7 +598,7 @@ > > if (__predict_false(uvm_map(map, &kva, size, uvm.kernel_object, > UVM_UNKNOWN_OFFSET, align, UVM_MAPFLAG(UVM_PROT_ALL, UVM_PROT_ALL, > - UVM_INH_NONE, UVM_ADV_RANDOM, 0)) != 0)) { > + UVM_INH_NONE, UVM_ADV_RANDOM, flags)) != 0)) { > UVMHIST_LOG(maphist, "<- done (no VM)", 0,0,0,0); > return(0); > } > Index: uvm_pager.c > =================================================================== > RCS file: /home/tedu/cvs/src/sys/uvm/uvm_pager.c,v > retrieving revision 1.54 > diff -u -r1.54 uvm_pager.c > --- uvm_pager.c 22 Jul 2009 21:05:37 -0000 1.54 > +++ uvm_pager.c 2 Feb 2010 04:23:57 -0000 > @@ -138,7 +138,7 @@ > { > KASSERT(pseg->start == 0); > KASSERT(pseg->use == 0); > - pseg->start = uvm_km_valloc(kernel_map, MAX_PAGER_SEGS * MAXBSIZE); > + pseg->start = uvm_km_valloc_try(kernel_map, MAX_PAGER_SEGS * MAXBSIZE);
I'd prefer uvm_km_kmemalloc(kernel_map, NULL, MAX_PAGER_SEGS * MAXBSIZE, UVM_KMF_TRYLOCK|UVM_KMF_VALLOC) instead of introducing a new uvm_km_valloc_* function. There are plenty of those already and most of them can be rewritten to uvm_km_kmemalloc with suitable flags. Apart from that, the diff looks fine to me. > } > > /* > I think it'll only fix the problem partially though: uvm_km_free* doesn't have a trylock variant and may sleep just as well on the lock. I think what happens is that another uvm_km_pgremove is sleeping on the wanted flag, holding the rwlock. I.e. another uvm_km_free is in progress somewhere else, waiting for a busy kernelpage (which hopefully is busy because of a 3rd kernel thread, possibly the pagedaemon). To summarize: [1] aiodone needs pseg mutex [2] mutex is held by pagedaemon thread which illegally sleeps [3] because the pagedaemon wants access to the uvm_map [4] which is held by another kernel thread sleeping while locking uvm_map [5] because another thread has its page marked busy The thread in [5] could be the pagedaemon, in which case we have a deadlock. However, making the allocation in [2] try-lock instead may indeed remove the deadlock (assuming no other process would try to mark pages busy that don't belong to it). But I think the deadlock would still resurface, simply because this is a kernel page and some other kernel thread may try to free its memory between the pagedaemon starting a page-out and the aiodone daemon cleaning up. To make matters worse, it could be that a uvm object decided it needed to be synced to disk: if memory serves me right any uvm_object io goes through this code. The only fix I can think of at the moment is not to use uvm_km_alloc and uvm_km_free at all for the pseg code. Ciao, -- Ariane