On Fri, Dec 11, 2015 at 12:16:19PM -0500, Christos Zoulas wrote: > On Dec 11, 8:54am, c...@chuq.com (Chuck Silvers) wrote: > -- Subject: Re: kernel memory allocation failures > > | I haven't looked see exactly what the internal conditions are that lead to > | kmem_alloc(KM_SLEEP) returning NULL, so it's not clear whether having > | kmem_alloc() retry or panic in that situation would be better. > | but either of those would be better than adding code to all the callers. > > http://nxr.netbsd.org/xref/src/sys/kern/subr_vmem.c#256 > > Where we are calling the pool code with PR_NOWAIT and the flags are not paid > attention to at all (except in the recursive calls).
the attached patch makes bt_alloc() retry the bt_refill() when called with VM_SLEEP, which I think removes all the ways that vmem_alloc() with VM_SLEEP (and thus kmem_alloc() with KM_SLEEP) can fail. > And there is also: > > http://nxr.netbsd.org/xref/src/sys/kern/subr_vmem.c#272 > > where the 3 bt_refill() calls are not even being checked (but at least > they use flags :-) the error is not checked because these are just a precaution, the function will not do anything different if these fail. I'll cast them to void. I'll commit this patch in a few days if there are no complaints. -Chuck
Index: src/sys/kern/subr_vmem.c =================================================================== RCS file: /home/chs/netbsd/cvs/src/sys/kern/subr_vmem.c,v retrieving revision 1.93 diff -u -p -r1.93 subr_vmem.c --- src/sys/kern/subr_vmem.c 24 Aug 2015 22:50:32 -0000 1.93 +++ src/sys/kern/subr_vmem.c 15 Feb 2016 20:05:36 -0000 @@ -194,6 +194,16 @@ static LIST_HEAD(, vmem_btag) vmem_btag_ static size_t vmem_btag_freelist_count = 0; static struct pool vmem_btag_pool; +static void +vmem_kick_pdaemon(void) +{ +#if defined(_KERNEL) + mutex_spin_enter(&uvm_fpageqlock); + uvm_kick_pdaemon(); + mutex_spin_exit(&uvm_fpageqlock); +#endif +} + /* ---- boundary tag */ static int bt_refill(vmem_t *vm, vm_flag_t flags); @@ -270,11 +280,11 @@ bt_refill(vmem_t *vm, vm_flag_t flags) VMEM_UNLOCK(vm); if (kmem_meta_arena != NULL) { - bt_refill(kmem_arena, (flags & ~VM_FITMASK) + (void)bt_refill(kmem_arena, (flags & ~VM_FITMASK) | VM_INSTANTFIT | VM_POPULATING); - bt_refill(kmem_va_meta_arena, (flags & ~VM_FITMASK) + (void)bt_refill(kmem_va_meta_arena, (flags & ~VM_FITMASK) | VM_INSTANTFIT | VM_POPULATING); - bt_refill(kmem_meta_arena, (flags & ~VM_FITMASK) + (void)bt_refill(kmem_meta_arena, (flags & ~VM_FITMASK) | VM_INSTANTFIT | VM_POPULATING); } @@ -289,7 +299,21 @@ bt_alloc(vmem_t *vm, vm_flag_t flags) while (vm->vm_nfreetags <= BT_MINRESERVE && (flags & VM_POPULATING) == 0) { VMEM_UNLOCK(vm); if (bt_refill(vm, VM_NOSLEEP | VM_INSTANTFIT)) { - return NULL; + if ((flags & VM_NOSLEEP) != 0) { + return NULL; + } + + /* + * It would be nice to wait for something specific here + * but there are multiple ways that a retry could + * succeed and we can't wait for multiple things + * simultaneously. So we'll just sleep for an arbitrary + * short period of time and retry regardless. + * This should be a very rare case. + */ + + vmem_kick_pdaemon(); + kpause("btalloc", false, 1, NULL); } VMEM_LOCK(vm); } @@ -1177,11 +1201,7 @@ retry: /* XXX */ if ((flags & VM_SLEEP) != 0) { -#if defined(_KERNEL) - mutex_spin_enter(&uvm_fpageqlock); - uvm_kick_pdaemon(); - mutex_spin_exit(&uvm_fpageqlock); -#endif + vmem_kick_pdaemon(); VMEM_LOCK(vm); VMEM_CONDVAR_WAIT(vm); VMEM_UNLOCK(vm); Index: src/sys/kern/subr_kmem.c =================================================================== RCS file: /home/chs/netbsd/cvs/src/sys/kern/subr_kmem.c,v retrieving revision 1.61 diff -u -p -r1.61 subr_kmem.c --- src/sys/kern/subr_kmem.c 27 Jul 2015 09:24:28 -0000 1.61 +++ src/sys/kern/subr_kmem.c 14 Feb 2016 18:13:48 -0000 @@ -383,9 +383,13 @@ kmem_intr_free(void *p, size_t requested void * kmem_alloc(size_t size, km_flag_t kmflags) { + void *v; + KASSERTMSG((!cpu_intr_p() && !cpu_softintr_p()), "kmem(9) should not be used from the interrupt context"); - return kmem_intr_alloc(size, kmflags); + v = kmem_intr_alloc(size, kmflags); + KASSERT(v || (kmflags & KM_NOSLEEP) != 0); + return v; } /* @@ -396,9 +400,13 @@ kmem_alloc(size_t size, km_flag_t kmflag void * kmem_zalloc(size_t size, km_flag_t kmflags) { + void *v; + KASSERTMSG((!cpu_intr_p() && !cpu_softintr_p()), "kmem(9) should not be used from the interrupt context"); - return kmem_intr_zalloc(size, kmflags); + v = kmem_intr_zalloc(size, kmflags); + KASSERT(v || (kmflags & KM_NOSLEEP) != 0); + return v; } /* Index: src/share/man/man9/vmem.9 =================================================================== RCS file: /home/chs/netbsd/cvs/src/share/man/man9/vmem.9,v retrieving revision 1.15 diff -u -p -r1.15 vmem.9 --- src/share/man/man9/vmem.9 29 Jan 2013 22:02:17 -0000 1.15 +++ src/share/man/man9/vmem.9 15 Feb 2016 19:39:43 -0000 @@ -83,7 +83,7 @@ other than virtual memory. .Fn vmem_create creates a new vmem arena. .Pp -.Bl -tag -width qcache_max +.Bl -tag -offset indent -width qcache_max .It Fa name The string to describe the vmem. .It Fa base @@ -118,7 +118,7 @@ calls to import a span of size at least .Fa size . .Fa allocfn -should accept the same +must accept the same .Fa flags as .Fn vmem_alloc . @@ -169,7 +169,8 @@ It is merely a hint and can be ignored. Either of: .Bl -tag -width VM_NOSLEEP .It Dv VM_SLEEP -Can sleep until enough resources are available. +If the allocation cannot be satisfied immediately, sleep until enough +resources are available. .It Dv VM_NOSLEEP Don't sleep. Immediately return @@ -184,7 +185,7 @@ Interrupt level to be blocked for alloca .Fn vmem_xcreate creates a new vmem arena. .Pp -.Bl -tag -width qcache_max +.Bl -tag -offset indent -width qcache_max .It Fa name The string to describe the vmem. .It Fa base @@ -220,7 +221,7 @@ calls to import a span of size at least .Fa size . .Fa allocfn -should accept the same +must accept the same .Fa flags as .Fn vmem_alloc . @@ -274,7 +275,8 @@ It is merely a hint and can be ignored. Either of: .Bl -tag -width VM_NOSLEEP .It Dv VM_SLEEP -Can sleep until enough resources are available. +If the allocation cannot be satisfied immediately, sleep until enough +resources are available. .It Dv VM_NOSLEEP Don't sleep. Immediately return @@ -297,23 +299,26 @@ Returns on success, .Dv ENOMEM on failure. -.Fa flags -should be one of: +.Bl -tag -offset indent -width flags +.It Fa flags +Either of: .Bl -tag -width VM_NOSLEEP .It Dv VM_SLEEP -Can sleep until enough resources are available. +If the allocation cannot be satisfied immediately, sleep until enough +resources are available. .It Dv VM_NOSLEEP Don't sleep. Immediately return .Dv ENOMEM if there are not enough resources available. .El +.El .Pp .\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - .Fn vmem_xalloc allocates a resource from the arena. .Pp -.Bl -tag -width nocross +.Bl -tag -offset indent -width nocross .It Fa vm The arena which we allocate from. .It Fa size @@ -333,10 +338,10 @@ If .Fa align is zero, .Fa phase -should be zero. +must be zero. Otherwise, .Fa phase -should be smaller than +must be smaller than .Fa align . .It Fa nocross Request a resource which doesn't cross @@ -353,7 +358,7 @@ if the caller does not care. .It Fa flags A bitwise OR of an allocation strategy and a sleep flag. .Pp -The allocation strategy is one of: +The allocation strategy must be one of: .Bl -tag -width VM_INSTANTFIT .It Dv VM_BESTFIT Prefer space efficiency. @@ -361,10 +366,11 @@ Prefer space efficiency. Prefer performance. .El .Pp -The sleep flag should be one of: +The sleep flag must be one of: .Bl -tag -width VM_NOSLEEP .It Dv VM_SLEEP -Can sleep until enough resources are available. +If the allocation cannot be satisfied immediately, sleep until enough +resources are available. .It Dv VM_NOSLEEP Don't sleep. Immediately return @@ -386,14 +392,14 @@ frees resource allocated by .Fn vmem_xalloc to the arena. .Pp -.Bl -tag -width addr +.Bl -tag -offset indent -width addr .It Fa vm The arena which we free to. .It Fa addr The resource being freed. -It must be the one returned by +It must have been allocated via .Fn vmem_xalloc . -Notably, it must not be the one from +Notably, it must not have been allocated via .Fn vmem_alloc . Otherwise, the behaviour is undefined. .It Fa size @@ -408,7 +414,7 @@ argument used for .Fn vmem_alloc allocates a resource from the arena. .Pp -.Bl -tag -width flags +.Bl -tag -offset indent -width flags .It Fa vm The arena which we allocate from. .It Fa size @@ -416,7 +422,7 @@ Specify the size of the allocation. .It Fa flags A bitwise OR of an allocation strategy and a sleep flag. .Pp -The allocation strategy is one of: +The allocation strategy must be one of: .Bl -tag -width VM_INSTANTFIT .It Dv VM_BESTFIT Prefer space efficiency. @@ -424,10 +430,11 @@ Prefer space efficiency. Prefer performance. .El .Pp -The sleep flag should be one of: +The sleep flag must be one of: .Bl -tag -width VM_NOSLEEP .It Dv VM_SLEEP -Can sleep until enough resources are available. +If the allocation cannot be satisfied immediately, sleep until enough +resources are available. .It Dv VM_NOSLEEP Don't sleep. Immediately return @@ -449,14 +456,14 @@ frees resource allocated by .Fn vmem_alloc to the arena. .Pp -.Bl -tag -width addr +.Bl -tag -offset indent -width addr .It Fa vm The arena which we free to. .It Fa addr The resource being freed. -It must be the one returned by +It must have been allocated via .Fn vmem_alloc . -Notably, it must not be the one from +Notably, it must not have been allocated via .Fn vmem_xalloc . Otherwise, the behaviour is undefined. .It Fa size @@ -471,10 +478,10 @@ argument used for .Fn vmem_destroy destroys a vmem arena. .Pp -.Bl -tag -width vm +.Bl -tag -offset indent -width vm .It Fa vm The vmem arena being destroyed. -The caller should ensure that no one will use it anymore. +The caller must ensure that no one will use it anymore. .El .\" ------------------------------------------------------------ .Sh RETURN VALUES Index: src/share/man/man9/kmem.9 =================================================================== RCS file: /home/chs/netbsd/cvs/src/share/man/man9/kmem.9,v retrieving revision 1.19 diff -u -p -r1.19 kmem.9 --- src/share/man/man9/kmem.9 11 Dec 2015 10:05:17 -0000 1.19 +++ src/share/man/man9/kmem.9 14 Feb 2016 18:22:31 -0000 @@ -77,15 +77,9 @@ Either of the following: .It Dv KM_SLEEP If the allocation cannot be satisfied immediately, sleep until enough memory is available. -Note that this does not mean that if +If .Dv KM_SLEEP is specified, then the allocation cannot fail. -Under resource stress conditions, the allocation can fail and the -function will return -.Dv NULL . -One such scenario is when the allocation size is larger than it can ever -be allocated; another is when the system memory resources are exhausted -to even allocate pools of pages. .It Dv KM_NOSLEEP Don't sleep. Immediately return @@ -143,9 +137,6 @@ It must be the one returned by .Fn kmem_alloc or .Fn kmem_zalloc . -One such scenario is when the allocation size is larger than it can ever -be allocated; another is when the system memory resources are exhausted -to even allocate pools of pages. .It Fa size The size of the memory being freed, in bytes. It must be the same as the @@ -185,10 +176,6 @@ It should be noted that .Fn kmem_free may also block. .Pp -Always check the return value of the allocators, even when -.Dv KM_SLEEP -is specified to avoid kernel crashes during resource stress conditions. -.Pp For some locks this is permissible or even unavoidable. For others, particularly locks that may be taken from soft interrupt context, it is a serious problem.