[PATCH v5 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

2016-05-12 Thread Thomas Garnier
Randomizes the virtual address space of kernel memory sections (physical
memory mapping, vmalloc & vmemmap) for x86_64. This security feature
mitigates exploits relying on predictable kernel addresses. These
addresses can be used to disclose the kernel modules base addresses or
corrupt specific structures to elevate privileges bypassing the current
implementation of KASLR. This feature can be enabled with the
CONFIG_RANDOMIZE_MEMORY option.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowning the base address and physical memory size, an
attacker can deduce the PDE virtual address for the vDSO memory page.
This attack was demonstrated at CanSecWest 2016, in the "Getting
Physical Extreme Abuse of Intel Based Paged Systems"
https://goo.gl/ANpWdV (see second part of the presentation). The
exploits used against Linux worked successfuly against 4.6+ but fail
with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
was done at Google leading to this patch proposal. Variants exists to
overwrite /proc or /sys objects ACLs leading to elevation of privileges.
These variants were tested against 4.6+.

The vmalloc memory section contains the allocation made through the
vmalloc api. The allocations are done sequentially to prevent
fragmentation and each allocation address can easily be deduced
especially from boot.

The vmemmap section holds a representation of the physical
memory (through a struct page array). An attacker could use this section
to disclose the kernel memory layout (walking the page linked list).

The order of each memory section is not changed. The feature looks at
the available space for the sections based on different configuration
options and randomizes the base and space between each. The size of the
physical memory mapping is the available physical memory. No performance
impact was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory section.  An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each section.

KASLR ident on compress boot was adapted to use the static base.

Updated documentation on x86_64 memory layout accordingly.

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

Signed-off-by: Thomas Garnier 
---
Based on next-20160511
---
 Documentation/x86/x86_64/mm.txt |   4 +
 arch/x86/Kconfig|  17 
 arch/x86/boot/compressed/pagetable.c|   3 +
 arch/x86/include/asm/kaslr.h|  12 +++
 arch/x86/include/asm/page_64_types.h|  11 ++-
 arch/x86/include/asm/pgtable_64.h   |   1 +
 arch/x86/include/asm/pgtable_64_types.h |  15 +++-
 arch/x86/kernel/head_64.S   |   2 +-
 arch/x86/kernel/setup.c |   3 +
 arch/x86/mm/Makefile|   1 +
 arch/x86/mm/dump_pagetables.c   |  16 +++-
 arch/x86/mm/init.c  |   6 ++
 arch/x86/mm/kaslr.c | 146 
 arch/x86/realmode/init.c|   5 +-
 14 files changed, 232 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/mm/kaslr.c

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5aa7383..8c7dd59 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later 
if needed).
 The mappings are not part of any other kernel PGD and are only available
 during EFI runtime calls.
 
+Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
+physical memory, vmalloc/ioremap space and virtual memory map are randomized.
+Their order is preserved but their base will be offset early at boot time.
+
 -Andi Kleen, Jul 2004
diff 

[PATCH v5 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

2016-05-12 Thread Thomas Garnier
Randomizes the virtual address space of kernel memory sections (physical
memory mapping, vmalloc & vmemmap) for x86_64. This security feature
mitigates exploits relying on predictable kernel addresses. These
addresses can be used to disclose the kernel modules base addresses or
corrupt specific structures to elevate privileges bypassing the current
implementation of KASLR. This feature can be enabled with the
CONFIG_RANDOMIZE_MEMORY option.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowning the base address and physical memory size, an
attacker can deduce the PDE virtual address for the vDSO memory page.
This attack was demonstrated at CanSecWest 2016, in the "Getting
Physical Extreme Abuse of Intel Based Paged Systems"
https://goo.gl/ANpWdV (see second part of the presentation). The
exploits used against Linux worked successfuly against 4.6+ but fail
with KASLR memory enabled (https://goo.gl/iTtXMJ). Similar research
was done at Google leading to this patch proposal. Variants exists to
overwrite /proc or /sys objects ACLs leading to elevation of privileges.
These variants were tested against 4.6+.

The vmalloc memory section contains the allocation made through the
vmalloc api. The allocations are done sequentially to prevent
fragmentation and each allocation address can easily be deduced
especially from boot.

The vmemmap section holds a representation of the physical
memory (through a struct page array). An attacker could use this section
to disclose the kernel memory layout (walking the page linked list).

The order of each memory section is not changed. The feature looks at
the available space for the sections based on different configuration
options and randomizes the base and space between each. The size of the
physical memory mapping is the available physical memory. No performance
impact was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory section.  An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each section.

KASLR ident on compress boot was adapted to use the static base.

Updated documentation on x86_64 memory layout accordingly.

Performance data:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115)
System Time 87.056 (0.456416)
Percent CPU 1092.9 (13.892)
Context Switches 199805 (3455.33)
Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053)
System Time 87.764 (0.49345)
Percent CPU 1095 (12.7715)
Context Switches 199036 (4298.1)
Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90
repeated 10 times):

attemp,before,after
1,0.076,0.069
2,0.072,0.069
3,0.066,0.066
4,0.066,0.068
5,0.066,0.067
6,0.066,0.069
7,0.067,0.066
8,0.063,0.067
9,0.067,0.065
10,0.068,0.071
average,0.0677,0.0677

Signed-off-by: Thomas Garnier 
---
Based on next-20160511
---
 Documentation/x86/x86_64/mm.txt |   4 +
 arch/x86/Kconfig|  17 
 arch/x86/boot/compressed/pagetable.c|   3 +
 arch/x86/include/asm/kaslr.h|  12 +++
 arch/x86/include/asm/page_64_types.h|  11 ++-
 arch/x86/include/asm/pgtable_64.h   |   1 +
 arch/x86/include/asm/pgtable_64_types.h |  15 +++-
 arch/x86/kernel/head_64.S   |   2 +-
 arch/x86/kernel/setup.c |   3 +
 arch/x86/mm/Makefile|   1 +
 arch/x86/mm/dump_pagetables.c   |  16 +++-
 arch/x86/mm/init.c  |   6 ++
 arch/x86/mm/kaslr.c | 146 
 arch/x86/realmode/init.c|   5 +-
 14 files changed, 232 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/mm/kaslr.c

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5aa7383..8c7dd59 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later 
if needed).
 The mappings are not part of any other kernel PGD and are only available
 during EFI runtime calls.
 
+Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
+physical memory, vmalloc/ioremap space and virtual memory map are randomized.
+Their order is preserved but their base will be offset early at boot time.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig