Re: [RFC v2 2/2] mm: SLUB Freelist randomization
On Wed, May 25, 2016 at 6:49 PM, Joonsoo Kimwrote: > 2016-05-25 6:15 GMT+09:00 Thomas Garnier : >> Implements Freelist randomization for the SLUB allocator. It was >> previous implemented for the SLAB allocator. Both use the same >> configuration option (CONFIG_SLAB_FREELIST_RANDOM). >> >> The list is randomized during initialization of a new set of pages. The >> order on different freelist sizes is pre-computed at boot for >> performance. Each kmem_cache has its own randomized freelist. This >> security feature reduces the predictability of the kernel SLUB allocator >> against heap overflows rendering attacks much less stable. >> >> For example these attacks exploit the predictability of the heap: >> - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) >> - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) >> >> Performance results: >> >> slab_test impact is between 3% to 4% on average: >> >> Before: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles >> 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles >> 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles >> 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles >> 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles >> 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles >> 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles >> 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles >> 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles >> 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 70 cycles >> 10 times kmalloc(16)/kfree -> 70 cycles >> 10 times kmalloc(32)/kfree -> 70 cycles >> 10 times kmalloc(64)/kfree -> 70 cycles >> 10 times kmalloc(128)/kfree -> 70 cycles >> 10 times kmalloc(256)/kfree -> 69 cycles >> 10 times kmalloc(512)/kfree -> 70 cycles >> 10 times kmalloc(1024)/kfree -> 73 cycles >> 10 times kmalloc(2048)/kfree -> 72 cycles >> 10 times kmalloc(4096)/kfree -> 71 cycles >> >> After: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles >> 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles >> 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles >> 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles >> 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles >> 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles >> 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles >> 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles >> 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles >> 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 66 cycles >> 10 times kmalloc(16)/kfree -> 66 cycles >> 10 times kmalloc(32)/kfree -> 66 cycles >> 10 times kmalloc(64)/kfree -> 66 cycles >> 10 times kmalloc(128)/kfree -> 65 cycles >> 10 times kmalloc(256)/kfree -> 67 cycles >> 10 times kmalloc(512)/kfree -> 67 cycles >> 10 times kmalloc(1024)/kfree -> 64 cycles >> 10 times kmalloc(2048)/kfree -> 67 cycles >> 10 times kmalloc(4096)/kfree -> 67 cycles >> >> Kernbench, before: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 101.873 (1.16069) >> User Time 1045.22 (1.60447) >> System Time 88.969 (0.559195) >> Percent CPU 1112.9 (13.8279) >> Context Switches 189140 (2282.15) >> Sleeps 99008.6 (768.091) >> >> After: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 102.47 (0.562732) >> User Time 1045.3 (1.34263) >> System Time 88.311 (0.342554) >> Percent CPU 1105.8 (6.49444) >> Context Switches 189081 (2355.78) >> Sleeps 99231.5 (800.358) >> >> Signed-off-by: Thomas Garnier >> --- >> Based on 0e01df100b6bf22a1de61b66657502a6454153c5 >> --- >> include/linux/slub_def.h | 8 +++ >> init/Kconfig | 4 +- >> mm/slub.c| 133 >> --- >> 3 files changed, 136 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h >> index 665cd0c..22d487e 100644 >> --- a/include/linux/slub_def.h >> +++ b/include/linux/slub_def.h >> @@ -56,6 +56,9 @@ struct kmem_cache_order_objects { >> unsigned long x; >> }; >> >> +/* Index used for freelist randomization */ >> +typedef unsigned int freelist_idx_t; >> + >> /* >> * Slab cache management. >> */ >> @@ -99,6 +102,11 @@ struct kmem_cache { >> */ >> int remote_node_defrag_ratio; >> #endif >> + >> +#ifdef CONFIG_SLAB_FREELIST_RANDOM >> + freelist_idx_t *random_seq; >> +#endif >> + >> struct kmem_cache_node *node[MAX_NUMNODES]; >>
Re: [RFC v2 2/2] mm: SLUB Freelist randomization
On Wed, May 25, 2016 at 6:49 PM, Joonsoo Kim wrote: > 2016-05-25 6:15 GMT+09:00 Thomas Garnier : >> Implements Freelist randomization for the SLUB allocator. It was >> previous implemented for the SLAB allocator. Both use the same >> configuration option (CONFIG_SLAB_FREELIST_RANDOM). >> >> The list is randomized during initialization of a new set of pages. The >> order on different freelist sizes is pre-computed at boot for >> performance. Each kmem_cache has its own randomized freelist. This >> security feature reduces the predictability of the kernel SLUB allocator >> against heap overflows rendering attacks much less stable. >> >> For example these attacks exploit the predictability of the heap: >> - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) >> - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) >> >> Performance results: >> >> slab_test impact is between 3% to 4% on average: >> >> Before: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles >> 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles >> 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles >> 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles >> 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles >> 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles >> 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles >> 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles >> 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles >> 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 70 cycles >> 10 times kmalloc(16)/kfree -> 70 cycles >> 10 times kmalloc(32)/kfree -> 70 cycles >> 10 times kmalloc(64)/kfree -> 70 cycles >> 10 times kmalloc(128)/kfree -> 70 cycles >> 10 times kmalloc(256)/kfree -> 69 cycles >> 10 times kmalloc(512)/kfree -> 70 cycles >> 10 times kmalloc(1024)/kfree -> 73 cycles >> 10 times kmalloc(2048)/kfree -> 72 cycles >> 10 times kmalloc(4096)/kfree -> 71 cycles >> >> After: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles >> 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles >> 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles >> 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles >> 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles >> 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles >> 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles >> 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles >> 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles >> 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 66 cycles >> 10 times kmalloc(16)/kfree -> 66 cycles >> 10 times kmalloc(32)/kfree -> 66 cycles >> 10 times kmalloc(64)/kfree -> 66 cycles >> 10 times kmalloc(128)/kfree -> 65 cycles >> 10 times kmalloc(256)/kfree -> 67 cycles >> 10 times kmalloc(512)/kfree -> 67 cycles >> 10 times kmalloc(1024)/kfree -> 64 cycles >> 10 times kmalloc(2048)/kfree -> 67 cycles >> 10 times kmalloc(4096)/kfree -> 67 cycles >> >> Kernbench, before: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 101.873 (1.16069) >> User Time 1045.22 (1.60447) >> System Time 88.969 (0.559195) >> Percent CPU 1112.9 (13.8279) >> Context Switches 189140 (2282.15) >> Sleeps 99008.6 (768.091) >> >> After: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 102.47 (0.562732) >> User Time 1045.3 (1.34263) >> System Time 88.311 (0.342554) >> Percent CPU 1105.8 (6.49444) >> Context Switches 189081 (2355.78) >> Sleeps 99231.5 (800.358) >> >> Signed-off-by: Thomas Garnier >> --- >> Based on 0e01df100b6bf22a1de61b66657502a6454153c5 >> --- >> include/linux/slub_def.h | 8 +++ >> init/Kconfig | 4 +- >> mm/slub.c| 133 >> --- >> 3 files changed, 136 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h >> index 665cd0c..22d487e 100644 >> --- a/include/linux/slub_def.h >> +++ b/include/linux/slub_def.h >> @@ -56,6 +56,9 @@ struct kmem_cache_order_objects { >> unsigned long x; >> }; >> >> +/* Index used for freelist randomization */ >> +typedef unsigned int freelist_idx_t; >> + >> /* >> * Slab cache management. >> */ >> @@ -99,6 +102,11 @@ struct kmem_cache { >> */ >> int remote_node_defrag_ratio; >> #endif >> + >> +#ifdef CONFIG_SLAB_FREELIST_RANDOM >> + freelist_idx_t *random_seq; >> +#endif >> + >> struct kmem_cache_node *node[MAX_NUMNODES]; >> }; >> >> diff --git a/init/Kconfig b/init/Kconfig >> index
Re: [RFC v2 2/2] mm: SLUB Freelist randomization
On Wed, May 25, 2016 at 3:25 PM, Kees Cookwrote: > On Tue, May 24, 2016 at 2:15 PM, Thomas Garnier wrote: >> Implements Freelist randomization for the SLUB allocator. It was >> previous implemented for the SLAB allocator. Both use the same >> configuration option (CONFIG_SLAB_FREELIST_RANDOM). >> >> The list is randomized during initialization of a new set of pages. The >> order on different freelist sizes is pre-computed at boot for >> performance. Each kmem_cache has its own randomized freelist. This >> security feature reduces the predictability of the kernel SLUB allocator >> against heap overflows rendering attacks much less stable. >> >> For example these attacks exploit the predictability of the heap: >> - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) >> - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) >> >> Performance results: >> >> slab_test impact is between 3% to 4% on average: > > Seems like slab_test is pretty intensive (so the impact appears > higher). On a more "regular" load like kernbench, the impact seems to > be almost 0. Is that accurate? > Yes, because the slab_test done is more intensive on a single thread. It will show higher perf impact than just a global testing. The overall impact on the system is of course much smaller. I will detail that on the performance details. > Regardless, please consider both patches: > > Reviewed-by: Kees Cook > > -Kees > >> >> Before: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles >> 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles >> 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles >> 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles >> 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles >> 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles >> 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles >> 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles >> 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles >> 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 70 cycles >> 10 times kmalloc(16)/kfree -> 70 cycles >> 10 times kmalloc(32)/kfree -> 70 cycles >> 10 times kmalloc(64)/kfree -> 70 cycles >> 10 times kmalloc(128)/kfree -> 70 cycles >> 10 times kmalloc(256)/kfree -> 69 cycles >> 10 times kmalloc(512)/kfree -> 70 cycles >> 10 times kmalloc(1024)/kfree -> 73 cycles >> 10 times kmalloc(2048)/kfree -> 72 cycles >> 10 times kmalloc(4096)/kfree -> 71 cycles >> >> After: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles >> 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles >> 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles >> 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles >> 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles >> 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles >> 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles >> 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles >> 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles >> 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 66 cycles >> 10 times kmalloc(16)/kfree -> 66 cycles >> 10 times kmalloc(32)/kfree -> 66 cycles >> 10 times kmalloc(64)/kfree -> 66 cycles >> 10 times kmalloc(128)/kfree -> 65 cycles >> 10 times kmalloc(256)/kfree -> 67 cycles >> 10 times kmalloc(512)/kfree -> 67 cycles >> 10 times kmalloc(1024)/kfree -> 64 cycles >> 10 times kmalloc(2048)/kfree -> 67 cycles >> 10 times kmalloc(4096)/kfree -> 67 cycles >> >> Kernbench, before: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 101.873 (1.16069) >> User Time 1045.22 (1.60447) >> System Time 88.969 (0.559195) >> Percent CPU 1112.9 (13.8279) >> Context Switches 189140 (2282.15) >> Sleeps 99008.6 (768.091) >> >> After: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 102.47 (0.562732) >> User Time 1045.3 (1.34263) >> System Time 88.311 (0.342554) >> Percent CPU 1105.8 (6.49444) >> Context Switches 189081 (2355.78) >> Sleeps 99231.5 (800.358) >> >> Signed-off-by: Thomas Garnier >> --- >> Based on 0e01df100b6bf22a1de61b66657502a6454153c5 >> --- >> include/linux/slub_def.h | 8 +++ >> init/Kconfig | 4 +- >> mm/slub.c| 133 >> --- >> 3 files changed, 136 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h >> index 665cd0c..22d487e 100644 >> ---
Re: [RFC v2 2/2] mm: SLUB Freelist randomization
On Wed, May 25, 2016 at 3:25 PM, Kees Cook wrote: > On Tue, May 24, 2016 at 2:15 PM, Thomas Garnier wrote: >> Implements Freelist randomization for the SLUB allocator. It was >> previous implemented for the SLAB allocator. Both use the same >> configuration option (CONFIG_SLAB_FREELIST_RANDOM). >> >> The list is randomized during initialization of a new set of pages. The >> order on different freelist sizes is pre-computed at boot for >> performance. Each kmem_cache has its own randomized freelist. This >> security feature reduces the predictability of the kernel SLUB allocator >> against heap overflows rendering attacks much less stable. >> >> For example these attacks exploit the predictability of the heap: >> - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) >> - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) >> >> Performance results: >> >> slab_test impact is between 3% to 4% on average: > > Seems like slab_test is pretty intensive (so the impact appears > higher). On a more "regular" load like kernbench, the impact seems to > be almost 0. Is that accurate? > Yes, because the slab_test done is more intensive on a single thread. It will show higher perf impact than just a global testing. The overall impact on the system is of course much smaller. I will detail that on the performance details. > Regardless, please consider both patches: > > Reviewed-by: Kees Cook > > -Kees > >> >> Before: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles >> 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles >> 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles >> 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles >> 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles >> 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles >> 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles >> 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles >> 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles >> 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 70 cycles >> 10 times kmalloc(16)/kfree -> 70 cycles >> 10 times kmalloc(32)/kfree -> 70 cycles >> 10 times kmalloc(64)/kfree -> 70 cycles >> 10 times kmalloc(128)/kfree -> 70 cycles >> 10 times kmalloc(256)/kfree -> 69 cycles >> 10 times kmalloc(512)/kfree -> 70 cycles >> 10 times kmalloc(1024)/kfree -> 73 cycles >> 10 times kmalloc(2048)/kfree -> 72 cycles >> 10 times kmalloc(4096)/kfree -> 71 cycles >> >> After: >> >> Single thread testing >> = >> 1. Kmalloc: Repeatedly allocate then free test >> 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles >> 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles >> 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles >> 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles >> 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles >> 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles >> 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles >> 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles >> 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles >> 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles >> 2. Kmalloc: alloc/free test >> 10 times kmalloc(8)/kfree -> 66 cycles >> 10 times kmalloc(16)/kfree -> 66 cycles >> 10 times kmalloc(32)/kfree -> 66 cycles >> 10 times kmalloc(64)/kfree -> 66 cycles >> 10 times kmalloc(128)/kfree -> 65 cycles >> 10 times kmalloc(256)/kfree -> 67 cycles >> 10 times kmalloc(512)/kfree -> 67 cycles >> 10 times kmalloc(1024)/kfree -> 64 cycles >> 10 times kmalloc(2048)/kfree -> 67 cycles >> 10 times kmalloc(4096)/kfree -> 67 cycles >> >> Kernbench, before: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 101.873 (1.16069) >> User Time 1045.22 (1.60447) >> System Time 88.969 (0.559195) >> Percent CPU 1112.9 (13.8279) >> Context Switches 189140 (2282.15) >> Sleeps 99008.6 (768.091) >> >> After: >> >> Average Optimal load -j 12 Run (std deviation): >> Elapsed Time 102.47 (0.562732) >> User Time 1045.3 (1.34263) >> System Time 88.311 (0.342554) >> Percent CPU 1105.8 (6.49444) >> Context Switches 189081 (2355.78) >> Sleeps 99231.5 (800.358) >> >> Signed-off-by: Thomas Garnier >> --- >> Based on 0e01df100b6bf22a1de61b66657502a6454153c5 >> --- >> include/linux/slub_def.h | 8 +++ >> init/Kconfig | 4 +- >> mm/slub.c| 133 >> --- >> 3 files changed, 136 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h >> index 665cd0c..22d487e 100644 >> --- a/include/linux/slub_def.h >> +++ b/include/linux/slub_def.h >> @@ -56,6 +56,9 @@ struct
Re: [RFC v2 2/2] mm: SLUB Freelist randomization
2016-05-25 6:15 GMT+09:00 Thomas Garnier: > Implements Freelist randomization for the SLUB allocator. It was > previous implemented for the SLAB allocator. Both use the same > configuration option (CONFIG_SLAB_FREELIST_RANDOM). > > The list is randomized during initialization of a new set of pages. The > order on different freelist sizes is pre-computed at boot for > performance. Each kmem_cache has its own randomized freelist. This > security feature reduces the predictability of the kernel SLUB allocator > against heap overflows rendering attacks much less stable. > > For example these attacks exploit the predictability of the heap: > - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) > - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) > > Performance results: > > slab_test impact is between 3% to 4% on average: > > Before: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles > 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles > 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles > 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles > 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles > 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles > 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles > 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles > 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles > 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 70 cycles > 10 times kmalloc(16)/kfree -> 70 cycles > 10 times kmalloc(32)/kfree -> 70 cycles > 10 times kmalloc(64)/kfree -> 70 cycles > 10 times kmalloc(128)/kfree -> 70 cycles > 10 times kmalloc(256)/kfree -> 69 cycles > 10 times kmalloc(512)/kfree -> 70 cycles > 10 times kmalloc(1024)/kfree -> 73 cycles > 10 times kmalloc(2048)/kfree -> 72 cycles > 10 times kmalloc(4096)/kfree -> 71 cycles > > After: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles > 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles > 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles > 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles > 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles > 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles > 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles > 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles > 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles > 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 66 cycles > 10 times kmalloc(16)/kfree -> 66 cycles > 10 times kmalloc(32)/kfree -> 66 cycles > 10 times kmalloc(64)/kfree -> 66 cycles > 10 times kmalloc(128)/kfree -> 65 cycles > 10 times kmalloc(256)/kfree -> 67 cycles > 10 times kmalloc(512)/kfree -> 67 cycles > 10 times kmalloc(1024)/kfree -> 64 cycles > 10 times kmalloc(2048)/kfree -> 67 cycles > 10 times kmalloc(4096)/kfree -> 67 cycles > > Kernbench, before: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 101.873 (1.16069) > User Time 1045.22 (1.60447) > System Time 88.969 (0.559195) > Percent CPU 1112.9 (13.8279) > Context Switches 189140 (2282.15) > Sleeps 99008.6 (768.091) > > After: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 102.47 (0.562732) > User Time 1045.3 (1.34263) > System Time 88.311 (0.342554) > Percent CPU 1105.8 (6.49444) > Context Switches 189081 (2355.78) > Sleeps 99231.5 (800.358) > > Signed-off-by: Thomas Garnier > --- > Based on 0e01df100b6bf22a1de61b66657502a6454153c5 > --- > include/linux/slub_def.h | 8 +++ > init/Kconfig | 4 +- > mm/slub.c| 133 > --- > 3 files changed, 136 insertions(+), 9 deletions(-) > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 665cd0c..22d487e 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -56,6 +56,9 @@ struct kmem_cache_order_objects { > unsigned long x; > }; > > +/* Index used for freelist randomization */ > +typedef unsigned int freelist_idx_t; > + > /* > * Slab cache management. > */ > @@ -99,6 +102,11 @@ struct kmem_cache { > */ > int remote_node_defrag_ratio; > #endif > + > +#ifdef CONFIG_SLAB_FREELIST_RANDOM > + freelist_idx_t *random_seq; > +#endif > + > struct kmem_cache_node *node[MAX_NUMNODES]; > }; > > diff --git a/init/Kconfig b/init/Kconfig > index a9c4aefd..fbb6678 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1771,10 +1771,10 @@ endchoice > > config SLAB_FREELIST_RANDOM >
Re: [RFC v2 2/2] mm: SLUB Freelist randomization
2016-05-25 6:15 GMT+09:00 Thomas Garnier : > Implements Freelist randomization for the SLUB allocator. It was > previous implemented for the SLAB allocator. Both use the same > configuration option (CONFIG_SLAB_FREELIST_RANDOM). > > The list is randomized during initialization of a new set of pages. The > order on different freelist sizes is pre-computed at boot for > performance. Each kmem_cache has its own randomized freelist. This > security feature reduces the predictability of the kernel SLUB allocator > against heap overflows rendering attacks much less stable. > > For example these attacks exploit the predictability of the heap: > - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) > - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) > > Performance results: > > slab_test impact is between 3% to 4% on average: > > Before: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles > 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles > 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles > 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles > 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles > 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles > 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles > 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles > 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles > 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 70 cycles > 10 times kmalloc(16)/kfree -> 70 cycles > 10 times kmalloc(32)/kfree -> 70 cycles > 10 times kmalloc(64)/kfree -> 70 cycles > 10 times kmalloc(128)/kfree -> 70 cycles > 10 times kmalloc(256)/kfree -> 69 cycles > 10 times kmalloc(512)/kfree -> 70 cycles > 10 times kmalloc(1024)/kfree -> 73 cycles > 10 times kmalloc(2048)/kfree -> 72 cycles > 10 times kmalloc(4096)/kfree -> 71 cycles > > After: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles > 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles > 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles > 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles > 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles > 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles > 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles > 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles > 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles > 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 66 cycles > 10 times kmalloc(16)/kfree -> 66 cycles > 10 times kmalloc(32)/kfree -> 66 cycles > 10 times kmalloc(64)/kfree -> 66 cycles > 10 times kmalloc(128)/kfree -> 65 cycles > 10 times kmalloc(256)/kfree -> 67 cycles > 10 times kmalloc(512)/kfree -> 67 cycles > 10 times kmalloc(1024)/kfree -> 64 cycles > 10 times kmalloc(2048)/kfree -> 67 cycles > 10 times kmalloc(4096)/kfree -> 67 cycles > > Kernbench, before: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 101.873 (1.16069) > User Time 1045.22 (1.60447) > System Time 88.969 (0.559195) > Percent CPU 1112.9 (13.8279) > Context Switches 189140 (2282.15) > Sleeps 99008.6 (768.091) > > After: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 102.47 (0.562732) > User Time 1045.3 (1.34263) > System Time 88.311 (0.342554) > Percent CPU 1105.8 (6.49444) > Context Switches 189081 (2355.78) > Sleeps 99231.5 (800.358) > > Signed-off-by: Thomas Garnier > --- > Based on 0e01df100b6bf22a1de61b66657502a6454153c5 > --- > include/linux/slub_def.h | 8 +++ > init/Kconfig | 4 +- > mm/slub.c| 133 > --- > 3 files changed, 136 insertions(+), 9 deletions(-) > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 665cd0c..22d487e 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -56,6 +56,9 @@ struct kmem_cache_order_objects { > unsigned long x; > }; > > +/* Index used for freelist randomization */ > +typedef unsigned int freelist_idx_t; > + > /* > * Slab cache management. > */ > @@ -99,6 +102,11 @@ struct kmem_cache { > */ > int remote_node_defrag_ratio; > #endif > + > +#ifdef CONFIG_SLAB_FREELIST_RANDOM > + freelist_idx_t *random_seq; > +#endif > + > struct kmem_cache_node *node[MAX_NUMNODES]; > }; > > diff --git a/init/Kconfig b/init/Kconfig > index a9c4aefd..fbb6678 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1771,10 +1771,10 @@ endchoice > > config SLAB_FREELIST_RANDOM > default n > - depends on SLAB > +
Re: [RFC v2 2/2] mm: SLUB Freelist randomization
On Tue, May 24, 2016 at 2:15 PM, Thomas Garnierwrote: > Implements Freelist randomization for the SLUB allocator. It was > previous implemented for the SLAB allocator. Both use the same > configuration option (CONFIG_SLAB_FREELIST_RANDOM). > > The list is randomized during initialization of a new set of pages. The > order on different freelist sizes is pre-computed at boot for > performance. Each kmem_cache has its own randomized freelist. This > security feature reduces the predictability of the kernel SLUB allocator > against heap overflows rendering attacks much less stable. > > For example these attacks exploit the predictability of the heap: > - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) > - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) > > Performance results: > > slab_test impact is between 3% to 4% on average: Seems like slab_test is pretty intensive (so the impact appears higher). On a more "regular" load like kernbench, the impact seems to be almost 0. Is that accurate? Regardless, please consider both patches: Reviewed-by: Kees Cook -Kees > > Before: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles > 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles > 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles > 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles > 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles > 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles > 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles > 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles > 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles > 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 70 cycles > 10 times kmalloc(16)/kfree -> 70 cycles > 10 times kmalloc(32)/kfree -> 70 cycles > 10 times kmalloc(64)/kfree -> 70 cycles > 10 times kmalloc(128)/kfree -> 70 cycles > 10 times kmalloc(256)/kfree -> 69 cycles > 10 times kmalloc(512)/kfree -> 70 cycles > 10 times kmalloc(1024)/kfree -> 73 cycles > 10 times kmalloc(2048)/kfree -> 72 cycles > 10 times kmalloc(4096)/kfree -> 71 cycles > > After: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles > 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles > 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles > 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles > 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles > 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles > 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles > 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles > 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles > 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 66 cycles > 10 times kmalloc(16)/kfree -> 66 cycles > 10 times kmalloc(32)/kfree -> 66 cycles > 10 times kmalloc(64)/kfree -> 66 cycles > 10 times kmalloc(128)/kfree -> 65 cycles > 10 times kmalloc(256)/kfree -> 67 cycles > 10 times kmalloc(512)/kfree -> 67 cycles > 10 times kmalloc(1024)/kfree -> 64 cycles > 10 times kmalloc(2048)/kfree -> 67 cycles > 10 times kmalloc(4096)/kfree -> 67 cycles > > Kernbench, before: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 101.873 (1.16069) > User Time 1045.22 (1.60447) > System Time 88.969 (0.559195) > Percent CPU 1112.9 (13.8279) > Context Switches 189140 (2282.15) > Sleeps 99008.6 (768.091) > > After: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 102.47 (0.562732) > User Time 1045.3 (1.34263) > System Time 88.311 (0.342554) > Percent CPU 1105.8 (6.49444) > Context Switches 189081 (2355.78) > Sleeps 99231.5 (800.358) > > Signed-off-by: Thomas Garnier > --- > Based on 0e01df100b6bf22a1de61b66657502a6454153c5 > --- > include/linux/slub_def.h | 8 +++ > init/Kconfig | 4 +- > mm/slub.c| 133 > --- > 3 files changed, 136 insertions(+), 9 deletions(-) > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 665cd0c..22d487e 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -56,6 +56,9 @@ struct kmem_cache_order_objects { > unsigned long x; > }; > > +/* Index used for freelist randomization */ > +typedef unsigned int freelist_idx_t; > + > /* > * Slab cache management. > */ > @@ -99,6 +102,11 @@ struct kmem_cache { > */ > int remote_node_defrag_ratio; > #endif > + > +#ifdef CONFIG_SLAB_FREELIST_RANDOM > + freelist_idx_t
Re: [RFC v2 2/2] mm: SLUB Freelist randomization
On Tue, May 24, 2016 at 2:15 PM, Thomas Garnier wrote: > Implements Freelist randomization for the SLUB allocator. It was > previous implemented for the SLAB allocator. Both use the same > configuration option (CONFIG_SLAB_FREELIST_RANDOM). > > The list is randomized during initialization of a new set of pages. The > order on different freelist sizes is pre-computed at boot for > performance. Each kmem_cache has its own randomized freelist. This > security feature reduces the predictability of the kernel SLUB allocator > against heap overflows rendering attacks much less stable. > > For example these attacks exploit the predictability of the heap: > - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) > - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) > > Performance results: > > slab_test impact is between 3% to 4% on average: Seems like slab_test is pretty intensive (so the impact appears higher). On a more "regular" load like kernbench, the impact seems to be almost 0. Is that accurate? Regardless, please consider both patches: Reviewed-by: Kees Cook -Kees > > Before: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 49 cycles kfree -> 77 cycles > 10 times kmalloc(16) -> 51 cycles kfree -> 79 cycles > 10 times kmalloc(32) -> 53 cycles kfree -> 83 cycles > 10 times kmalloc(64) -> 62 cycles kfree -> 90 cycles > 10 times kmalloc(128) -> 81 cycles kfree -> 97 cycles > 10 times kmalloc(256) -> 98 cycles kfree -> 121 cycles > 10 times kmalloc(512) -> 95 cycles kfree -> 122 cycles > 10 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles > 10 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles > 10 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 70 cycles > 10 times kmalloc(16)/kfree -> 70 cycles > 10 times kmalloc(32)/kfree -> 70 cycles > 10 times kmalloc(64)/kfree -> 70 cycles > 10 times kmalloc(128)/kfree -> 70 cycles > 10 times kmalloc(256)/kfree -> 69 cycles > 10 times kmalloc(512)/kfree -> 70 cycles > 10 times kmalloc(1024)/kfree -> 73 cycles > 10 times kmalloc(2048)/kfree -> 72 cycles > 10 times kmalloc(4096)/kfree -> 71 cycles > > After: > > Single thread testing > = > 1. Kmalloc: Repeatedly allocate then free test > 10 times kmalloc(8) -> 57 cycles kfree -> 78 cycles > 10 times kmalloc(16) -> 61 cycles kfree -> 81 cycles > 10 times kmalloc(32) -> 76 cycles kfree -> 93 cycles > 10 times kmalloc(64) -> 83 cycles kfree -> 94 cycles > 10 times kmalloc(128) -> 106 cycles kfree -> 107 cycles > 10 times kmalloc(256) -> 118 cycles kfree -> 117 cycles > 10 times kmalloc(512) -> 114 cycles kfree -> 116 cycles > 10 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles > 10 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles > 10 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles > 2. Kmalloc: alloc/free test > 10 times kmalloc(8)/kfree -> 66 cycles > 10 times kmalloc(16)/kfree -> 66 cycles > 10 times kmalloc(32)/kfree -> 66 cycles > 10 times kmalloc(64)/kfree -> 66 cycles > 10 times kmalloc(128)/kfree -> 65 cycles > 10 times kmalloc(256)/kfree -> 67 cycles > 10 times kmalloc(512)/kfree -> 67 cycles > 10 times kmalloc(1024)/kfree -> 64 cycles > 10 times kmalloc(2048)/kfree -> 67 cycles > 10 times kmalloc(4096)/kfree -> 67 cycles > > Kernbench, before: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 101.873 (1.16069) > User Time 1045.22 (1.60447) > System Time 88.969 (0.559195) > Percent CPU 1112.9 (13.8279) > Context Switches 189140 (2282.15) > Sleeps 99008.6 (768.091) > > After: > > Average Optimal load -j 12 Run (std deviation): > Elapsed Time 102.47 (0.562732) > User Time 1045.3 (1.34263) > System Time 88.311 (0.342554) > Percent CPU 1105.8 (6.49444) > Context Switches 189081 (2355.78) > Sleeps 99231.5 (800.358) > > Signed-off-by: Thomas Garnier > --- > Based on 0e01df100b6bf22a1de61b66657502a6454153c5 > --- > include/linux/slub_def.h | 8 +++ > init/Kconfig | 4 +- > mm/slub.c| 133 > --- > 3 files changed, 136 insertions(+), 9 deletions(-) > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 665cd0c..22d487e 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -56,6 +56,9 @@ struct kmem_cache_order_objects { > unsigned long x; > }; > > +/* Index used for freelist randomization */ > +typedef unsigned int freelist_idx_t; > + > /* > * Slab cache management. > */ > @@ -99,6 +102,11 @@ struct kmem_cache { > */ > int remote_node_defrag_ratio; > #endif > + > +#ifdef CONFIG_SLAB_FREELIST_RANDOM > + freelist_idx_t *random_seq; > +#endif > + > struct kmem_cache_node