On Mon, Feb 01, 2010 at 11:30:06PM -0500, Ted Unangst wrote:
> I think this fixes the problem with sleeping and holding pseg_lck.
> 
> Index: uvm_extern.h
> ===================================================================
> RCS file: /home/tedu/cvs/src/sys/uvm/uvm_extern.h,v
> retrieving revision 1.82
> diff -u -r1.82 uvm_extern.h
> --- uvm_extern.h      11 Aug 2009 18:43:33 -0000      1.82
> +++ uvm_extern.h      2 Feb 2010 04:28:09 -0000
> @@ -517,8 +517,9 @@
>                               vaddr_t *, vsize_t, int,
>                               boolean_t, vm_map_t);
>  vaddr_t                      uvm_km_valloc(vm_map_t, vsize_t);
> -vaddr_t                      uvm_km_valloc_align(vm_map_t, vsize_t, vsize_t);
> +vaddr_t                      uvm_km_valloc_try(vm_map_t, vsize_t);
>  vaddr_t                      uvm_km_valloc_wait(vm_map_t, vsize_t);
> +vaddr_t                      uvm_km_valloc_align(struct vm_map *, vsize_t, 
> vsize_t, int);
>  vaddr_t                      uvm_km_valloc_prefer_wait(vm_map_t, vsize_t,
>                                       voff_t);
>  void                 *uvm_km_getpage(boolean_t, int *);
> Index: uvm_km.c
> ===================================================================
> RCS file: /home/tedu/cvs/src/sys/uvm/uvm_km.c,v
> retrieving revision 1.75
> diff -u -r1.75 uvm_km.c
> --- uvm_km.c  25 Jul 2009 12:55:40 -0000      1.75
> +++ uvm_km.c  2 Feb 2010 04:27:35 -0000
> @@ -571,11 +571,17 @@
>  vaddr_t
>  uvm_km_valloc(struct vm_map *map, vsize_t size)
>  {
> -     return(uvm_km_valloc_align(map, size, 0));
> +     return(uvm_km_valloc_align(map, size, 0, 0));
>  }
>  
>  vaddr_t
> -uvm_km_valloc_align(struct vm_map *map, vsize_t size, vsize_t align)
> +uvm_km_valloc_try(struct vm_map *map, vsize_t size)
> +{
> +     return(uvm_km_valloc_align(map, size, 0, UVM_FLAG_TRYLOCK));
> +}
> +
> +vaddr_t
> +uvm_km_valloc_align(struct vm_map *map, vsize_t size, vsize_t align, int 
> flags)
>  {
>       vaddr_t kva;
>       UVMHIST_FUNC("uvm_km_valloc"); UVMHIST_CALLED(maphist);
> @@ -592,7 +598,7 @@
>  
>       if (__predict_false(uvm_map(map, &kva, size, uvm.kernel_object,
>           UVM_UNKNOWN_OFFSET, align, UVM_MAPFLAG(UVM_PROT_ALL, UVM_PROT_ALL,
> -         UVM_INH_NONE, UVM_ADV_RANDOM, 0)) != 0)) {
> +         UVM_INH_NONE, UVM_ADV_RANDOM, flags)) != 0)) {
>               UVMHIST_LOG(maphist, "<- done (no VM)", 0,0,0,0);
>               return(0);
>       }
> Index: uvm_pager.c
> ===================================================================
> RCS file: /home/tedu/cvs/src/sys/uvm/uvm_pager.c,v
> retrieving revision 1.54
> diff -u -r1.54 uvm_pager.c
> --- uvm_pager.c       22 Jul 2009 21:05:37 -0000      1.54
> +++ uvm_pager.c       2 Feb 2010 04:23:57 -0000
> @@ -138,7 +138,7 @@
>  {
>       KASSERT(pseg->start == 0);
>       KASSERT(pseg->use == 0);
> -     pseg->start = uvm_km_valloc(kernel_map, MAX_PAGER_SEGS * MAXBSIZE);
> +     pseg->start = uvm_km_valloc_try(kernel_map, MAX_PAGER_SEGS * MAXBSIZE);

I'd prefer
 uvm_km_kmemalloc(kernel_map, NULL, MAX_PAGER_SEGS * MAXBSIZE,
    UVM_KMF_TRYLOCK|UVM_KMF_VALLOC)
instead of introducing a new uvm_km_valloc_* function. There are plenty
of those already and most of them can be rewritten to uvm_km_kmemalloc
with suitable flags.

Apart from that, the diff looks fine to me.

>  }
>  
>  /*
> 

I think it'll only fix the problem partially though: uvm_km_free*
doesn't have a trylock variant and may sleep just as well on the lock.

I think what happens is that another uvm_km_pgremove is sleeping on
the wanted flag, holding the rwlock. I.e. another uvm_km_free is in
progress somewhere else, waiting for a busy kernelpage (which hopefully
is busy because of a 3rd kernel thread, possibly the pagedaemon).

To summarize:
[1] aiodone needs pseg mutex
[2] mutex is held by pagedaemon thread which illegally sleeps
[3] because the pagedaemon wants access to the uvm_map
[4] which is held by another kernel thread sleeping while locking uvm_map
[5] because another thread has its page marked busy

The thread in [5] could be the pagedaemon, in which case we have a
deadlock. However, making the allocation in [2] try-lock instead may
indeed remove the deadlock (assuming no other process would try to mark
pages busy that don't belong to it).
But I think the deadlock would still resurface, simply because this is a
kernel page and some other kernel thread may try to free its memory
between the pagedaemon starting a page-out and the aiodone daemon
cleaning up.
To make matters worse, it could be that a uvm object decided it needed
to be synced to disk: if memory serves me right any uvm_object io goes
through this code.

The only fix I can think of at the moment is not to use uvm_km_alloc and
uvm_km_free at all for the pseg code.

Ciao,
-- 
Ariane

Reply via email to