Re: [Xen-devel] [PATCH v6 14/27] x86/percpu: Adapt percpu for PIE support

2019-04-08 Thread Thomas Garnier
On Mon, Apr 8, 2019 at 10:56 AM Christopher Lameter  wrote:
>
> On Mon, 8 Apr 2019, Thomas Garnier wrote:
>
> > > It didn't work originally but I will revisit to see if I missed something.
> >
> > I revisited and couldn't find a way to prevent relocations to the
> > percpu section. Without PIE, you can reference absolute address which
> > was convenient for percpu.
>
> Can you switch PIE off for the percpu section? If not maybe the linker
> needs to have an additional option?

I don't think so or I didn't find any option to do that. Changing the
linker might be a bit too much if we have a software solution which
doesn't impact performance.

>
> Cannot imagine that this is not possible. You neeed to be able to
> reference registers that are in fixed memory locations.
>
>
> > Christopher: Did you have something specific in mind?
>
> I thought that we just leave it as is.

I would like to as well. I will try couple things at the assembly
level instead of the linker and come back to this thread.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 14/27] x86/percpu: Adapt percpu for PIE support

2019-04-08 Thread Thomas Garnier
On Fri, Feb 1, 2019 at 9:13 AM Thomas Garnier  wrote:
>
> On Thu, Jan 31, 2019 at 6:31 PM Christopher Lameter  wrote:
> >
> > On Thu, 31 Jan 2019, Thomas Garnier wrote:
> >
> > > The per-cpu symbols are in a section that is zero based to create
> > > offsets. The compiler doesn't see them as offsets but as relative
> > > symbol and try to relocate them. Given the distance between zero and
> > > the mapped kernel is much larger than the instruction offset range, it
> > > fails to do it.
> >
> > We switch that off in the linker. If that does not work with your
> > modifications then you need to figure out how to update the link
> > configuration.
> >
>
> It didn't work originally but I will revisit to see if I missed something.

I revisited and couldn't find a way to prevent relocations to the
percpu section. Without PIE, you can reference absolute address which
was convenient for percpu.

Christopher: Did you have something specific in mind?

I checked the following:
 - Changing the FLAGS() on the PHDRS.
 - using -z noreloc-overflow which actually doesn't seem to apply to
PC32 relocations.
 - Look at all linker options and script format for anything around that.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 14/27] x86/percpu: Adapt percpu for PIE support

2019-02-01 Thread Thomas Garnier
On Thu, Jan 31, 2019 at 6:31 PM Christopher Lameter  wrote:
>
> On Thu, 31 Jan 2019, Thomas Garnier wrote:
>
> > The per-cpu symbols are in a section that is zero based to create
> > offsets. The compiler doesn't see them as offsets but as relative
> > symbol and try to relocate them. Given the distance between zero and
> > the mapped kernel is much larger than the instruction offset range, it
> > fails to do it.
>
> We switch that off in the linker. If that does not work with your
> modifications then you need to figure out how to update the link
> configuration.
>

It didn't work originally but I will revisit to see if I missed something.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 00/27] x86: PIE support and option to extend KASLR randomization

2019-01-31 Thread Thomas Garnier
On Thu, Jan 31, 2019 at 1:41 PM Konrad Rzeszutek Wilk
 wrote:
>
> On Thu, Jan 31, 2019 at 11:24:07AM -0800, Thomas Garnier wrote:
> > There has been no major concern in the latest iterations. I am interested on
> > what would be the best way to slowly integrate this patchset upstream.
>
> One question that I was somehow expected in this cover letter - what
> about all those lovely speculative bugs? As in say some one hasn't
> updated their machine with the Spectre v3a microcode - wouldn't they
> be able to get the kernel virtual address space?

Yes they would be.

>
> In effect rendering all this hard-work not needed?

Only if we think Spectre bugs will never be fixed.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 14/27] x86/percpu: Adapt percpu for PIE support

2019-01-31 Thread Thomas Garnier
On Thu, Jan 31, 2019 at 12:57 PM Christopher Lameter  wrote:
>
> On Thu, 31 Jan 2019, Thomas Garnier wrote:
>
> > Perpcu uses a clever design where the .percu ELF section has a virtual
> > address of zero and the custom linux relocation code avoid relocating
> > specific symbols. It makes the code simple and easily adaptable with or
> > without SMP support.
>
> We usually talk about this as offsets rather than addressess. The intend
> here is to give every processor its own address that is unique for this
> processor. Operations are always relative to a segment register and the
> whole area can be relocated at will by simply changing the segment
> register.
>
> > This design is incompatible with PIE. While creating a PIE binary, the
> > copmiler tries to make everything relative. The compiler will attempt to
>
> This is very compatible with PIE because it is already relative.

The per-cpu symbols are in a section that is zero based to create
offsets. The compiler doesn't see them as offsets but as relative
symbol and try to relocate them. Given the distance between zero and
the mapped kernel is much larger than the instruction offset range, it
fails to do it.

>
> > generate instructions with the distance between zero and any 64-bit
> > virtual address. It will fail as the relocation range cannot fit within
> > the possible instructions accessing a segment register.
>
> Leave the offsets alone and just change the segment register if you need
> to relocate the area of a specific processor?
>
> > The assembly and PER_CPU macros are changed to use relative references
> > when PIE is enabled.
>
> They already use relative reference. What is the point here?
>
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -5,9 +5,11 @@
> >  #ifdef CONFIG_X86_64
> >  #define __percpu_seg gs
> >  #define __percpu_mov_op  movq
> > +#define __percpu_rel (%rip)
>
> The percpu section cannot be IP relative since we need to have separate
> address spaces per cpu.
>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 00/27] x86: PIE support and option to extend KASLR randomization

2019-01-31 Thread Thomas Garnier
There has been no major concern in the latest iterations. I am interested on
what would be the best way to slowly integrate this patchset upstream.

Changes:
 - patch v6:
   - Rebase on latest changes in jump tables and crypto.
   - Fix wording on couple commits.
   - Revisit checkpatch warnings.
   - Moving to @chromium.org.
 - patch v5:
   - Adapt new crypto modules for PIE.
   - Improve per-cpu commit message.
   - Fix xen 32-bit build error with .quad.
   - Remove extra code for ftrace.
 - patch v4:
   - Simplify early boot by removing global variables.
   - Modify the mcount location script for __mcount_loc intead of the address
 read in the ftrace implementation.
   - Edit commit description to explain better where the kernel can be located.
   - Streamlined the testing done on each patch proposal. Always testing
 hibernation, suspend, ftrace and kprobe to ensure no regressions.
 - patch v3:
   - Update on message to describe longer term PIE goal.
   - Minor change on ftrace if condition.
   - Changed code using xchgq.
 - patch v2:
   - Adapt patch to work post KPTI and compiler changes
   - Redo all performance testing with latest configs and compilers
   - Simplify mov macro on PIE (MOVABS now)
   - Reduce GOT footprint
 - patch v1:
   - Simplify ftrace implementation.
   - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
 - rfc v3:
   - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
 mapped memory. It also simplifies the relocation process.
   - Move the start the module section next to the kernel. Remove the need for
 -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
   - Support for XEN PVH as 32-bit relocations can be ignored with
 --emit-relocs.
   - Support for GOT relocations previously done automatically with -pie.
   - Remove need for dynamic PLT in modules.
   - Support dymamic GOT for modules.
 - rfc v2:
   - Add support for global stack cookie while compiler default to fs without
 mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
 preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G. The chosen range is the one currently
available, future changes will allow the kernel module to have a wider
randomization range.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-2, 4-13, 18-19: Change in assembly code to be PIE compliant.
 - 3: Add a new _ASM_MOVABS macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
   It removes errors between compilation units.
 - 16: Add PROVIDE_HIDDEN replacement on the linker script for weak symbols to
   reduce GOT footprint.
 - 17: Adapt relocation tool to handle PIE binary correctly.
 - 20: Add support for global cookie.
 - 21: Support ftrace with PIE (used on Ubuntu config).
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
   from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.18%
 - PIE enabled: -1.977% (less relocations)
 .text section:
 - PIE disabled: same
 - PIE enabled: same

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: +0.21%
 - PIE enabled: +10%
 .text section:
 - PIE disabled: same
 - PIE enabled: +0.001%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg -/+ 0.5% on latest test).
 - PIE enabled: between -1% to +1% in average (default and Ubuntu config).

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.5%)
 - PIE enabled: average -0.5% to +0.5%
 System Time:
 - PIE disabled: no significant change (avg -0.1%)
 - PIE enabled: average -0.4% to +0.4%.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt  |3 
 

[Xen-devel] [PATCH v6 18/27] xen: Adapt assembly for PIE support

2019-01-31 Thread Thomas Garnier
Change the assembly code to use the new _ASM_MOVABS macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extend the
KASLR randomization range below 0x8000.

Signed-off-by: Thomas Garnier 
Reviewed-by: Juergen Gross 
---
 arch/x86/platform/pvh/head.S | 14 ++
 arch/x86/tools/relocs.c  | 16 +++-
 arch/x86/xen/xen-head.S  | 11 ++-
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index 1f8825bbaffb..e52d8b31e01d 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -103,8 +103,8 @@ ENTRY(pvh_start_xen)
call xen_prepare_pvh
 
/* startup_64 expects boot_params in %rsi. */
-   mov $_pa(pvh_bootparams), %rsi
-   mov $_pa(startup_64), %rax
+   movabs $_pa(pvh_bootparams), %rsi
+   movabs $_pa(startup_64), %rax
jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -150,10 +150,16 @@ END(pvh_start_xen)
 
.section ".init.data","aw"
.balign 8
+   /*
+* Use an ASM_PTR (quad on x64) for _pa(gdt_start) because PIE requires
+* a pointer size storage value before applying the relocation. On
+* 32-bit _ASM_PTR will be a long which is aligned the space needed for
+* relocation.
+*/
 gdt:
.word gdt_end - gdt_start
-   .long _pa(gdt_start)
-   .word 0
+   _ASM_PTR _pa(gdt_start)
+   .balign 8
 gdt_start:
.quad 0x/* NULL descriptor */
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 2a3c703218cc..1b5ee38446b6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -837,6 +837,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char 
*symname)
strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(Elf_Addr offset)
+{
+   Elf_Sym *sym = sym_lookup("pvh_start_xen");
+   return sym && (offset >= sym->st_value) &&
+   (offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
  const char *symname)
@@ -909,8 +919,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, 
ElfW(Sym) *sym,
 * the relocations are processed.
 * Make sure that the offset will fit.
 */
-   if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+   if (r_type != R_X86_64_64 &&
+   (int32_t)offset != (int64_t)offset) {
+   if (is_in_xenpvh_assembly(offset))
+   break;
die("Relocation offset doesn't fit in 32 bits\n");
+   }
 
if (r_type == R_X86_64_64)
add_reloc(, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 5077ead5e59c..4418ff0a1d96 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -28,14 +28,15 @@ ENTRY(startup_xen)
 
/* Clear .bss */
xor %eax,%eax
-   mov $__bss_start, %_ASM_DI
-   mov $__bss_stop, %_ASM_CX
+   _ASM_MOVABS $__bss_start, %_ASM_DI
+   _ASM_MOVABS $__bss_stop, %_ASM_CX
sub %_ASM_DI, %_ASM_CX
shr $__ASM_SEL(2, 3), %_ASM_CX
rep __ASM_SIZE(stos)
 
-   mov %_ASM_SI, xen_start_info
-   mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+   _ASM_MOVABS $xen_start_info, %_ASM_AX
+   _ASM_MOV %_ASM_SI, (%_ASM_AX)
+   _ASM_MOVABS $init_thread_union+THREAD_SIZE, %_ASM_SP
 
 #ifdef CONFIG_X86_64
/* Set up %gs.
@@ -46,7 +47,7 @@ ENTRY(startup_xen)
 * init data section till per cpu areas are set up.
 */
movl$MSR_GS_BASE,%ecx
-   movq$INIT_PER_CPU_VAR(irq_stack_union),%rax
+   movabsq $INIT_PER_CPU_VAR(irq_stack_union),%rax
cdq
wrmsr
 #endif
-- 
2.20.1.495.gaa96b0ce6b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 14/27] x86/percpu: Adapt percpu for PIE support

2019-01-31 Thread Thomas Garnier
Perpcu uses a clever design where the .percu ELF section has a virtual
address of zero and the custom linux relocation code avoid relocating
specific symbols. It makes the code simple and easily adaptable with or
without SMP support.

This design is incompatible with PIE. While creating a PIE binary, the
copmiler tries to make everything relative. The compiler will attempt to
generate instructions with the distance between zero and any 64-bit
virtual address. It will fail as the relocation range cannot fit within
the possible instructions accessing a segment register.

This patch solves tihs problem by removing the zero mapping. The .percpu
symbols are now close to the base of the kernel and the compiler
generates appropriate relocations. To accomodate this change, the GS base
is adapted to be the difference between zero and the .percpu section
address. These changes are done only when PIE is enabled. The original
implementation is kept as-is by default.

The assembly and PER_CPU macros are changed to use relative references
when PIE is enabled.

The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given
percpu symbols are not absolute in this case.

Position Independent Executable (PIE) support will allow to extend the
KASLR randomization range below 0x8000.

Signed-off-by: Thomas Garnier 
---
 arch/x86/entry/calling.h |  2 +-
 arch/x86/entry/entry_64.S|  4 ++--
 arch/x86/include/asm/percpu.h| 25 +++--
 arch/x86/include/asm/processor.h |  4 +++-
 arch/x86/kernel/head_64.S|  4 
 arch/x86/kernel/setup_percpu.c   |  5 -
 arch/x86/kernel/vmlinux.lds.S| 13 +++--
 arch/x86/lib/cmpxchg16b_emu.S|  8 
 arch/x86/xen/xen-asm.S   | 12 ++--
 init/Kconfig |  2 +-
 10 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index efb0d1b1f15f..d5a6d3a0c24b 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -218,7 +218,7 @@ For 32-bit we have the following conventions - kernel is 
built with
 .endm
 
 #define THIS_CPU_user_pcid_flush_mask   \
-   PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+   PER_CPU_VAR(cpu_tlbstate + TLB_STATE_user_pcid_flush_mask)
 
 .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 16a93eb4c11f..fc15fe058d3c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -298,7 +298,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
movqTASK_stack_canary(%rsi), %rbx
-   movq%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+   movq%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
 
 #ifdef CONFIG_RETPOLINE
@@ -841,7 +841,7 @@ apicinterrupt IRQ_WORK_VECTOR   
irq_work_interrupt  smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw + (TSS_ist + ((x) - 1) * 8))
 
 /**
  * idtentry - Generate an IDT entry stub
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 1a19d11cfbbd..608c15751f29 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -5,9 +5,11 @@
 #ifdef CONFIG_X86_64
 #define __percpu_seg   gs
 #define __percpu_mov_opmovq
+#define __percpu_rel   (%rip)
 #else
 #define __percpu_seg   fs
 #define __percpu_mov_opmovl
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
@@ -28,10 +30,14 @@
 #define PER_CPU(var, reg)  \
__percpu_mov_op %__percpu_seg:this_cpu_off, reg;\
lea var(reg), reg
-#define PER_CPU_VAR(var)   %__percpu_seg:var
+/* Compatible with Position Independent Code */
+#define PER_CPU_VAR(var)   %__percpu_seg:(var)##__percpu_rel
+/* Rare absolute reference */
+#define PER_CPU_VAR_ABS(var)   %__percpu_seg:var
 #else /* ! SMP */
 #define PER_CPU(var, reg)  __percpu_mov_op $var, reg
-#define PER_CPU_VAR(var)   var
+#define PER_CPU_VAR(var)   (var)##__percpu_rel
+#define PER_CPU_VAR_ABS(var)   var
 #endif /* SMP */
 
 #ifdef CONFIG_X86_64_SMP
@@ -209,27 +215,34 @@ do {  
\
pfo_ret__;  \
 })
 
+/* Position Independent code uses relative addresses only */
+#ifdef CONFIG_X86_PIE
+#define __percpu_stable_arg __percpu_arg(a1)
+#else
+#define __percpu_stable_arg __percpu_arg(P1)
+#endif
+
 #define percpu_stable_op(op, var)  \
 ({ \
t

[Xen-devel] [PATCH v5 14/27] x86/percpu: Adapt percpu for PIE support

2018-06-25 Thread Thomas Garnier
Perpcu uses a clever design where the .percu ELF section has a virtual
address of zero and the custom linux relocation code avoid relocating
specific symbols. It makes the code simple and easily adaptable with or
without SMP support.

This design is incompatible with PIE. While creating a PIE binary, the
copmiler tries to make everything relative. The compiler will attempt to
generate instructions with the distance between zero and any 64-bit
virtual address. It will fail as the relocation range cannot fit within
the possible instructions accessing a segment register.

This patch solves tihs problem by removing the zero mapping. The .percpu
symbols are now close to the base of the kernel and the compiler
generates appropriate relocations. To accomodate this change, the GS base
is adapted to be the difference between zero and the .percpu section
address. These changes are done only when PIE is enabled. The original
implementation is kept as-is by default.

The assembly and PER_CPU macros are changed to use relative references
when PIE is enabled.

The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given
percpu symbols are not absolute in this case.

Position Independent Executable (PIE) support will allow to extend the
KASLR randomization range 0x8000.

Signed-off-by: Thomas Garnier 
---
 arch/x86/entry/calling.h |  2 +-
 arch/x86/entry/entry_64.S|  4 ++--
 arch/x86/include/asm/percpu.h| 25 +++--
 arch/x86/include/asm/processor.h |  4 +++-
 arch/x86/kernel/head_64.S|  4 
 arch/x86/kernel/setup_percpu.c   |  5 -
 arch/x86/kernel/vmlinux.lds.S| 13 +++--
 arch/x86/lib/cmpxchg16b_emu.S|  8 
 arch/x86/xen/xen-asm.S   | 12 ++--
 init/Kconfig |  2 +-
 10 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 352e70cd33e8..d6c60e6b598f 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -218,7 +218,7 @@ For 32-bit we have the following conventions - kernel is 
built with
 .endm
 
 #define THIS_CPU_user_pcid_flush_mask   \
-   PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+   PER_CPU_VAR(cpu_tlbstate + TLB_STATE_user_pcid_flush_mask)
 
 .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e738d8d0e308..2afd2e2a86db 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -359,7 +359,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
movqTASK_stack_canary(%rsi), %rbx
-   movq%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+   movq%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
 
 #ifdef CONFIG_RETPOLINE
@@ -898,7 +898,7 @@ apicinterrupt IRQ_WORK_VECTOR   
irq_work_interrupt  smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw + (TSS_ist + ((x) - 1) * 8))
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
 ENTRY(\sym)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index e9202a0de8f0..e1f05ae6dd21 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -5,9 +5,11 @@
 #ifdef CONFIG_X86_64
 #define __percpu_seg   gs
 #define __percpu_mov_opmovq
+#define __percpu_rel   (%rip)
 #else
 #define __percpu_seg   fs
 #define __percpu_mov_opmovl
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
@@ -28,10 +30,14 @@
 #define PER_CPU(var, reg)  \
__percpu_mov_op %__percpu_seg:this_cpu_off, reg;\
lea var(reg), reg
-#define PER_CPU_VAR(var)   %__percpu_seg:var
+/* Compatible with Position Independent Code */
+#define PER_CPU_VAR(var)   %__percpu_seg:(var)##__percpu_rel
+/* Rare absolute reference */
+#define PER_CPU_VAR_ABS(var)   %__percpu_seg:var
 #else /* ! SMP */
 #define PER_CPU(var, reg)  __percpu_mov_op $var, reg
-#define PER_CPU_VAR(var)   var
+#define PER_CPU_VAR(var)   (var)##__percpu_rel
+#define PER_CPU_VAR_ABS(var)   var
 #endif /* SMP */
 
 #ifdef CONFIG_X86_64_SMP
@@ -209,27 +215,34 @@ do {  
\
pfo_ret__;  \
 })
 
+/* Position Independent code uses relative addresses only */
+#ifdef CONFIG_X86_PIE
+#define __percpu_stable_arg __percpu_arg(a1)
+#else
+#define __percpu_stable_arg __percpu_arg(P1)
+#endif
+
 #define percpu_stable_op(op, var)  \
 ({ \
t

[Xen-devel] [PATCH v5 18/27] xen: Adapt assembly for PIE support

2018-06-25 Thread Thomas Garnier
Change the assembly code to use the new _ASM_MOVABS macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extend the
KASLR randomization range 0x8000.

Signed-off-by: Thomas Garnier 
Reviewed-by: Juergen Gross 
---
 arch/x86/tools/relocs.c | 16 +++-
 arch/x86/xen/xen-head.S | 11 ++-
 arch/x86/xen/xen-pvh.S  | 14 ++
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index a35cc337f883..29283ad3950f 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -832,6 +832,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char 
*symname)
strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+   ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+   return sym && (offset >= sym->st_value) &&
+   (offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
  const char *symname)
@@ -895,8 +905,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, 
ElfW(Sym) *sym,
 * the relocations are processed.
 * Make sure that the offset will fit.
 */
-   if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+   if (r_type != R_X86_64_64 &&
+   (int32_t)offset != (int64_t)offset) {
+   if (is_in_xenpvh_assembly(offset))
+   break;
die("Relocation offset doesn't fit in 32 bits\n");
+   }
 
if (r_type == R_X86_64_64)
add_reloc(, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 5077ead5e59c..4418ff0a1d96 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -28,14 +28,15 @@ ENTRY(startup_xen)
 
/* Clear .bss */
xor %eax,%eax
-   mov $__bss_start, %_ASM_DI
-   mov $__bss_stop, %_ASM_CX
+   _ASM_MOVABS $__bss_start, %_ASM_DI
+   _ASM_MOVABS $__bss_stop, %_ASM_CX
sub %_ASM_DI, %_ASM_CX
shr $__ASM_SEL(2, 3), %_ASM_CX
rep __ASM_SIZE(stos)
 
-   mov %_ASM_SI, xen_start_info
-   mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+   _ASM_MOVABS $xen_start_info, %_ASM_AX
+   _ASM_MOV %_ASM_SI, (%_ASM_AX)
+   _ASM_MOVABS $init_thread_union+THREAD_SIZE, %_ASM_SP
 
 #ifdef CONFIG_X86_64
/* Set up %gs.
@@ -46,7 +47,7 @@ ENTRY(startup_xen)
 * init data section till per cpu areas are set up.
 */
movl$MSR_GS_BASE,%ecx
-   movq$INIT_PER_CPU_VAR(irq_stack_union),%rax
+   movabsq $INIT_PER_CPU_VAR(irq_stack_union),%rax
cdq
wrmsr
 #endif
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index ca2d3b2bf2af..4b83f861b655 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -114,8 +114,8 @@ ENTRY(pvh_start_xen)
call xen_prepare_pvh
 
/* startup_64 expects boot_params in %rsi. */
-   mov $_pa(pvh_bootparams), %rsi
-   mov $_pa(startup_64), %rax
+   movabs $_pa(pvh_bootparams), %rsi
+   movabs $_pa(startup_64), %rax
jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -161,10 +161,16 @@ END(pvh_start_xen)
 
.section ".init.data","aw"
.balign 8
+   /*
+* Use an ASM_PTR (quad on x64) for _pa(gdt_start) because PIE requires
+* a pointer size storage value before applying the relocation. On
+* 32-bit _ASM_PTR will be a long which is aligned the space needed for
+* relocation.
+*/
 gdt:
.word gdt_end - gdt_start
-   .long _pa(gdt_start)
-   .word 0
+   _ASM_PTR _pa(gdt_start)
+   .balign 8
 gdt_start:
.quad 0x/* NULL descriptor */
 #ifdef CONFIG_X86_64
-- 
2.18.0.rc2.346.g013aa6912e-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 00/27] x86: PIE support and option to extend KASLR randomization

2018-06-25 Thread Thomas Garnier
Changes:
 - patch v5:
   - Adapt new crypto modules for PIE.
   - Improve per-cpu commit message.
   - Fix xen 32-bit build error with .quad.
   - Remove extra code for ftrace.
 - patch v4:
   - Simplify early boot by removing global variables.
   - Modify the mcount location script for __mcount_loc intead of the address
 read in the ftrace implementation.
   - Edit commit description to explain better where the kernel can be located.
   - Streamlined the testing done on each patch proposal. Always testing
 hibernation, suspend, ftrace and kprobe to ensure no regressions.
 - patch v3:
   - Update on message to describe longer term PIE goal.
   - Minor change on ftrace if condition.
   - Changed code using xchgq.
 - patch v2:
   - Adapt patch to work post KPTI and compiler changes
   - Redo all performance testing with latest configs and compilers
   - Simplify mov macro on PIE (MOVABS now)
   - Reduce GOT footprint
 - patch v1:
   - Simplify ftrace implementation.
   - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
 - rfc v3:
   - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
 mapped memory. It also simplifies the relocation process.
   - Move the start the module section next to the kernel. Remove the need for
 -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
   - Support for XEN PVH as 32-bit relocations can be ignored with
 --emit-relocs.
   - Support for GOT relocations previously done automatically with -pie.
   - Remove need for dynamic PLT in modules.
   - Support dymamic GOT for modules.
 - rfc v2:
   - Add support for global stack cookie while compiler default to fs without
 mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
 preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G. The chosen range is the one currently
available, future changes will allow the kernel module to have a wider
randomization range.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-13, 18-19: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_MOVABS macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
   It removes errors between compilation units.
 - 16: Add PROVIDE_HIDDEN replacement on the linker script for weak symbols to
   reduce GOT footprint.
 - 17: Adapt relocation tool to handle PIE binary correctly.
 - 20: Add support for global cookie.
 - 21: Support ftrace with PIE (used on Ubuntu config).
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
   from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.18%
 - PIE enabled: -1.977% (less relocations)
 .text section:
 - PIE disabled: same
 - PIE enabled: same

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: +0.21%
 - PIE enabled: +10%
 .text section:
 - PIE disabled: same
 - PIE enabled: +0.001%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg -/+ 0.5% on latest test).
 - PIE enabled: between -1% to +1% in average (default and Ubuntu config).

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.5%)
 - PIE enabled: average -0.5% to +0.5%
 System Time:
 - PIE disabled: no significant change (avg -0.1%)
 - PIE enabled: average -0.4% to +0.4%.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt  |3 v
 arch/x86/Kconfig |   45 ++
 arch/x86/Makefile|   58 
 arch/x86/boot/boot.h |2 
 arch/x86/boot/compressed/Makefile|5 
 arch/x86/boot/compressed/misc.c  |   10 +
 

Re: [Xen-devel] [PATCH v4 18/27] xen: Adapt assembly for PIE support

2018-06-01 Thread Thomas Garnier
On Fri, Jun 1, 2018 at 8:40 AM Boris Ostrovsky
 wrote:
>
> On 05/29/2018 06:15 PM, Thomas Garnier wrote:
> > diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
> > index ca2d3b2bf2af..82ba89ba8bb3 100644
> > --- a/arch/x86/xen/xen-pvh.S
> > +++ b/arch/x86/xen/xen-pvh.S
> > @@ -114,8 +114,8 @@ ENTRY(pvh_start_xen)
> >   call xen_prepare_pvh
> >
> >   /* startup_64 expects boot_params in %rsi. */
> > - mov $_pa(pvh_bootparams), %rsi
> > - mov $_pa(startup_64), %rax
> > + movabs $_pa(pvh_bootparams), %rsi
> > + movabs $_pa(startup_64), %rax
> >   jmp *%rax
> >
> >  #else /* CONFIG_X86_64 */
> > @@ -161,10 +161,15 @@ END(pvh_start_xen)
> >
> >   .section ".init.data","aw"
> >   .balign 8
> > + /*
> > +  * Use a quad for _pa(gdt_start) because PIE does not understand a
> > +  * long is enough. The resulting value will still be in the lower long
> > +  * part.
> > +  */
> >  gdt:
> >   .word gdt_end - gdt_start
> > - .long _pa(gdt_start)
> > - .word 0
> > + .quad _pa(gdt_start)
>
>
> With this becoming .quad 32-bit compilation fails:
>
> /data/root/linux/arch/x86/xen/xen-pvh.S: Assembler messages:
> /data/root/linux/arch/x86/xen/xen-pvh.S:147: Error: cannot represent
> relocation type BFD_RELOC_64

Thanks, I will look to fix this in the next patch set and run a full
32-bit build.

>
>
> -boris
>
>
> > + .balign 8
> >  gdt_start:
> >   .quad 0x/* NULL descriptor */
> >  #ifdef CONFIG_X86_64
>


-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4 18/27] xen: Adapt assembly for PIE support

2018-05-29 Thread Thomas Garnier
Change the assembly code to use the new _ASM_MOVABS macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extend the
KASLR randomization range 0x8000.

Signed-off-by: Thomas Garnier 
Reviewed-by: Juergen Gross 
---
 arch/x86/tools/relocs.c | 16 +++-
 arch/x86/xen/xen-head.S | 11 ++-
 arch/x86/xen/xen-pvh.S  | 13 +
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index a35cc337f883..29283ad3950f 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -832,6 +832,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char 
*symname)
strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+   ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+   return sym && (offset >= sym->st_value) &&
+   (offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
  const char *symname)
@@ -895,8 +905,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, 
ElfW(Sym) *sym,
 * the relocations are processed.
 * Make sure that the offset will fit.
 */
-   if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+   if (r_type != R_X86_64_64 &&
+   (int32_t)offset != (int64_t)offset) {
+   if (is_in_xenpvh_assembly(offset))
+   break;
die("Relocation offset doesn't fit in 32 bits\n");
+   }
 
if (r_type == R_X86_64_64)
add_reloc(, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 5077ead5e59c..4418ff0a1d96 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -28,14 +28,15 @@ ENTRY(startup_xen)
 
/* Clear .bss */
xor %eax,%eax
-   mov $__bss_start, %_ASM_DI
-   mov $__bss_stop, %_ASM_CX
+   _ASM_MOVABS $__bss_start, %_ASM_DI
+   _ASM_MOVABS $__bss_stop, %_ASM_CX
sub %_ASM_DI, %_ASM_CX
shr $__ASM_SEL(2, 3), %_ASM_CX
rep __ASM_SIZE(stos)
 
-   mov %_ASM_SI, xen_start_info
-   mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+   _ASM_MOVABS $xen_start_info, %_ASM_AX
+   _ASM_MOV %_ASM_SI, (%_ASM_AX)
+   _ASM_MOVABS $init_thread_union+THREAD_SIZE, %_ASM_SP
 
 #ifdef CONFIG_X86_64
/* Set up %gs.
@@ -46,7 +47,7 @@ ENTRY(startup_xen)
 * init data section till per cpu areas are set up.
 */
movl$MSR_GS_BASE,%ecx
-   movq$INIT_PER_CPU_VAR(irq_stack_union),%rax
+   movabsq $INIT_PER_CPU_VAR(irq_stack_union),%rax
cdq
wrmsr
 #endif
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index ca2d3b2bf2af..82ba89ba8bb3 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -114,8 +114,8 @@ ENTRY(pvh_start_xen)
call xen_prepare_pvh
 
/* startup_64 expects boot_params in %rsi. */
-   mov $_pa(pvh_bootparams), %rsi
-   mov $_pa(startup_64), %rax
+   movabs $_pa(pvh_bootparams), %rsi
+   movabs $_pa(startup_64), %rax
jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -161,10 +161,15 @@ END(pvh_start_xen)
 
.section ".init.data","aw"
.balign 8
+   /*
+* Use a quad for _pa(gdt_start) because PIE does not understand a
+* long is enough. The resulting value will still be in the lower long
+* part.
+*/
 gdt:
.word gdt_end - gdt_start
-   .long _pa(gdt_start)
-   .word 0
+   .quad _pa(gdt_start)
+   .balign 8
 gdt_start:
.quad 0x/* NULL descriptor */
 #ifdef CONFIG_X86_64
-- 
2.17.0.921.gf22659ad46-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4 14/27] x86/percpu: Adapt percpu for PIE support

2018-05-29 Thread Thomas Garnier
Perpcu uses a clever design where the .percu ELF section has a virtual
address of zero and the relocation code avoid relocating specific
symbols. It makes the code simple and easily adaptable with or without
SMP support.

This design is incompatible with PIE because generated code always try to
access the zero virtual address relative to the default mapping address.
It becomes impossible when KASLR is configured to go below -2G. This
patch solves this problem by removing the zero mapping and adapting the GS
base to be relative to the expected address. These changes are done only
when PIE is enabled. The original implementation is kept as-is
by default.

The assembly and PER_CPU macros are changed to use relative references
when PIE is enabled.

The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given
percpu symbols are not absolute in this case.

Position Independent Executable (PIE) support will allow to extend the
KASLR randomization range 0x8000.

Signed-off-by: Thomas Garnier 
---
 arch/x86/entry/calling.h |  2 +-
 arch/x86/entry/entry_64.S|  4 ++--
 arch/x86/include/asm/percpu.h| 25 +++--
 arch/x86/include/asm/processor.h |  4 +++-
 arch/x86/kernel/head_64.S|  4 
 arch/x86/kernel/setup_percpu.c   |  5 -
 arch/x86/kernel/vmlinux.lds.S| 13 +++--
 arch/x86/lib/cmpxchg16b_emu.S|  8 
 arch/x86/xen/xen-asm.S   | 12 ++--
 init/Kconfig |  2 +-
 10 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 352e70cd33e8..d6c60e6b598f 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -218,7 +218,7 @@ For 32-bit we have the following conventions - kernel is 
built with
 .endm
 
 #define THIS_CPU_user_pcid_flush_mask   \
-   PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+   PER_CPU_VAR(cpu_tlbstate + TLB_STATE_user_pcid_flush_mask)
 
 .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1cbf4c3616a8..f9b42ca4bf60 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -359,7 +359,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_CC_STACKPROTECTOR
movqTASK_stack_canary(%rsi), %rbx
-   movq%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+   movq%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
 
 #ifdef CONFIG_RETPOLINE
@@ -897,7 +897,7 @@ apicinterrupt IRQ_WORK_VECTOR   
irq_work_interrupt  smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw + (TSS_ist + ((x) - 1) * 8))
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
 ENTRY(\sym)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a06b07399d17..7d1271b536ea 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -5,9 +5,11 @@
 #ifdef CONFIG_X86_64
 #define __percpu_seg   gs
 #define __percpu_mov_opmovq
+#define __percpu_rel   (%rip)
 #else
 #define __percpu_seg   fs
 #define __percpu_mov_opmovl
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
@@ -28,10 +30,14 @@
 #define PER_CPU(var, reg)  \
__percpu_mov_op %__percpu_seg:this_cpu_off, reg;\
lea var(reg), reg
-#define PER_CPU_VAR(var)   %__percpu_seg:var
+/* Compatible with Position Independent Code */
+#define PER_CPU_VAR(var)   %__percpu_seg:(var)##__percpu_rel
+/* Rare absolute reference */
+#define PER_CPU_VAR_ABS(var)   %__percpu_seg:var
 #else /* ! SMP */
 #define PER_CPU(var, reg)  __percpu_mov_op $var, reg
-#define PER_CPU_VAR(var)   var
+#define PER_CPU_VAR(var)   (var)##__percpu_rel
+#define PER_CPU_VAR_ABS(var)   var
 #endif /* SMP */
 
 #ifdef CONFIG_X86_64_SMP
@@ -209,27 +215,34 @@ do {  
\
pfo_ret__;  \
 })
 
+/* Position Independent code uses relative addresses only */
+#ifdef CONFIG_X86_PIE
+#define __percpu_stable_arg __percpu_arg(a1)
+#else
+#define __percpu_stable_arg __percpu_arg(P1)
+#endif
+
 #define percpu_stable_op(op, var)  \
 ({ \
typeof(var) pfo_ret__;  \
switch (sizeof(var)) {  \
case 1: \
-   asm(op "b "__percpu_arg(P1)",%0"\
+   asm(op "b "__percpu_stable_ar

[Xen-devel] [PATCH v4 00/27] x86: PIE support and option to extend KASLR randomization

2018-05-29 Thread Thomas Garnier
Changes:
 - patch v4:
   - Simplify early boot by removing global variables.
   - Modify the mcount location script for __mcount_loc intead of the address
 read in the ftrace implementation.
   - Edit commit description to explain better where the kernel can be located.
   - Streamlined the testing done on each patch proposal. Always testing
 hibernation, suspend, ftrace and kprobe to ensure no regressions.
 - patch v3:
   - Update on message to describe longer term PIE goal.
   - Minor change on ftrace if condition.
   - Changed code using xchgq.
 - patch v2:
   - Adapt patch to work post KPTI and compiler changes
   - Redo all performance testing with latest configs and compilers
   - Simplify mov macro on PIE (MOVABS now)
   - Reduce GOT footprint
 - patch v1:
   - Simplify ftrace implementation.
   - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
 - rfc v3:
   - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
 mapped memory. It also simplifies the relocation process.
   - Move the start the module section next to the kernel. Remove the need for
 -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
   - Support for XEN PVH as 32-bit relocations can be ignored with
 --emit-relocs.
   - Support for GOT relocations previously done automatically with -pie.
   - Remove need for dynamic PLT in modules.
   - Support dymamic GOT for modules.
 - rfc v2:
   - Add support for global stack cookie while compiler default to fs without
 mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
 preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G. The chosen range is the one currently
available, future changes will allow the kernel module to have a wider
randomization range.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-13, 18-19: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_MOVABS macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
   It removes errors between compilation units.
 - 16: Add PROVIDE_HIDDEN replacement on the linker script for weak symbols to
   reduce GOT footprint.
 - 17: Adapt relocation tool to handle PIE binary correctly.
 - 20: Add support for global cookie.
 - 21: Support ftrace with PIE (used on Ubuntu config).
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
   from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.18%
 - PIE enabled: -1.977% (less relocations)
 .text section:
 - PIE disabled: same
 - PIE enabled: same

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: +0.21%
 - PIE enabled: +10%
 .text section:
 - PIE disabled: same
 - PIE enabled: +0.001%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg -/+ 0.5% on latest test).
 - PIE enabled: between -1% to +1% in average (default and Ubuntu config).

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.5%)
 - PIE enabled: average -0.5% to +0.5%
 System Time:
 - PIE disabled: no significant change (avg -0.1%)
 - PIE enabled: average -0.4% to +0.4%.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt  |3 
 arch/x86/Kconfig |   45 ++
 arch/x86/Makefile|   58 
 arch/x86/boot/boot.h |2 
 arch/x86/boot/compressed/Makefile|5 
 arch/x86/boot/compressed/misc.c  |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S  |   45 --
 arch/x86/crypto/aesni-intel_asm.S|8 -
 arch/x86/crypto/aesni-intel_avx-x86_64.S |6 
 

Re: [Xen-devel] [PATCH v3 21/27] x86/ftrace: Adapt function tracing for PIE support

2018-05-29 Thread Thomas Garnier
On Thu, May 24, 2018 at 1:41 PM Thomas Garnier  wrote:


> On Thu, May 24, 2018 at 1:16 PM Steven Rostedt 
wrote:

> > On Thu, 24 May 2018 13:40:24 +0200
> > Petr Mladek  wrote:

> > > On Wed 2018-05-23 12:54:15, Thomas Garnier wrote:
> > > > When using -fPIE/PIC with function tracing, the compiler generates a
> > > > call through the GOT (call *__fentry__@GOTPCREL). This instruction
> > > > takes 6 bytes instead of 5 on the usual relative call.
> > > >
> > > > If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte
> nop
> > > > so ftrace can handle the previous 5-bytes as before.
> > > >
> > > > Position Independent Executable (PIE) support will allow to extended
> the
> > > > KASLR randomization range below the -2G memory limit.
> > > >
> > > > Signed-off-by: Thomas Garnier 
> > > > ---
> > > >  arch/x86/include/asm/ftrace.h   |  6 +++--
> > > >  arch/x86/include/asm/sections.h |  4 
> > > >  arch/x86/kernel/ftrace.c| 42
> +++--
> > > >  3 files changed, 48 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/ftrace.h
> b/arch/x86/include/asm/ftrace.h
> > > > index c18ed65287d5..8f2decce38d8 100644
> > > > --- a/arch/x86/include/asm/ftrace.h
> > > > +++ b/arch/x86/include/asm/ftrace.h
> > > > @@ -25,9 +25,11 @@ extern void __fentry__(void);
> > > >  static inline unsigned long ftrace_call_adjust(unsigned long addr)
> > > >  {
> > > > /*
> > > > -* addr is the address of the mcount call instruction.
> > > > -* recordmcount does the necessary offset calculation.
> > > > +* addr is the address of the mcount call instruction. PIE has
> always a
> > > > +* byte added to the start of the function.
> > > >  */
> > > > +   if (IS_ENABLED(CONFIG_X86_PIE))
> > > > +   addr -= 1;
> > >
> > > This seems to modify the address even for modules that are _not_
> compiled with
> > > PIE, see below.

> > Can one load a module not compiled for PIE in a kernel with PIE?

> > >
> > > > return addr;
> > > >  }
> > > >
> > > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > > > index 01ebcb6f263e..73b3c30cb7a3 100644
> > > > --- a/arch/x86/kernel/ftrace.c
> > > > +++ b/arch/x86/kernel/ftrace.c
> > > > @@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip,
> unsigned const char *old_code,
> > > > return 0;
> > > >  }
> > > >
> > > > +/* Bytes before call GOT offset */
> > > > +const unsigned char got_call_preinsn[] = { 0xff, 0x15 };
> > > > +
> > > > +static int
> > > > +ftrace_modify_initial_code(unsigned long ip, unsigned const char
> *old_code,
> > > > +  unsigned const char *new_code)
> > > > +{
> > > > +   unsigned char replaced[MCOUNT_INSN_SIZE + 1];
> > > > +
> > > > +   ftrace_expected = old_code;
> > > > +
> > > > +   /*
> > > > +* If PIE is not enabled or no GOT call was found, default to
the
> > > > +* original approach to code modification.
> > > > +*/
> > > > +   if (!IS_ENABLED(CONFIG_X86_PIE) ||
> > > > +   probe_kernel_read(replaced, (void *)ip, sizeof(replaced)) ||
> > > > +   memcmp(replaced, got_call_preinsn,
sizeof(got_call_preinsn)))
> > > > +   return ftrace_modify_code_direct(ip, old_code,
new_code);
> > >
> > > And this looks like an attempt to handle modules compiled without
> > > PIE. Does it works with the right ip in that case?

> > I'm guessing the || is for the "or no GOT call was found", but it
> > doesn't explain why no GOT would be found.

> Yes, maybe I could have made it work by using text_ip_addr earlier.


> > >
> > > I wonder if a better solution would be to update
> > > scripts/recordmcount.c to store the incremented location into the
> module.

> I will look into it.

Found a way to properly change the __mcount_loc using the preprocessing
(removing the need for -1 on the addr). It will be part of the next version.

Thanks for the feedback.



> > If recordmcount.c can handle this, then I think that's the preferred
> > approach. Thanks!

> > -- Steve

> > >
> > > IMPORTANT: I have only vague picture about how this all works. It is
> > > possible that I am completely wrong. The code might be correct,
> > > especially if you tested this situation.
> > >
> > > Best Regards,
> > > Petr



> --
> Thomas



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/27] x86/acpi: Adapt assembly for PIE support

2018-05-29 Thread Thomas Garnier
On Tue, May 29, 2018 at 5:31 AM Pavel Machek  wrote:

> On Fri 2018-05-25 10:00:04, Thomas Garnier wrote:
> > On Fri, May 25, 2018 at 2:14 AM Pavel Machek  wrote:
> >
> > > On Thu 2018-05-24 09:35:42, Thomas Garnier wrote:
> > > > On Thu, May 24, 2018 at 4:03 AM Pavel Machek  wrote:
> > > >
> > > > > On Wed 2018-05-23 12:54:03, Thomas Garnier wrote:
> > > > > > Change the assembly code to use only relative references of
symbols
> > for
> > > > the
> > > > > > kernel to be PIE compatible.
> > > > > >
> > > > > > Position Independent Executable (PIE) support will allow to
> > extended the
> > > > > > KASLR randomization range below the -2G memory limit.
> > > >
> > > > > What testing did this get?
> > > >
> > > > Tested boot, hibernation and performance on qemu and dedicated
machine.
> >
> > > Well, this is suspend, not hibernation code.
> >
> > > So "sudo pm-suspend" or "echo mem > /sys/power/state" would be good
> > > way to test this.
> >
> > Thanks, it worked. I added this to the testsuite I use for KASLR.

> Thanks!

> You can add my Acked-by:.

Will do. Thanks for the review.


>  Pavel

> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/27] x86/acpi: Adapt assembly for PIE support

2018-05-25 Thread Thomas Garnier
On Fri, May 25, 2018 at 2:14 AM Pavel Machek <pa...@ucw.cz> wrote:

> On Thu 2018-05-24 09:35:42, Thomas Garnier wrote:
> > On Thu, May 24, 2018 at 4:03 AM Pavel Machek <pa...@ucw.cz> wrote:
> >
> > > On Wed 2018-05-23 12:54:03, Thomas Garnier wrote:
> > > > Change the assembly code to use only relative references of symbols
for
> > the
> > > > kernel to be PIE compatible.
> > > >
> > > > Position Independent Executable (PIE) support will allow to
extended the
> > > > KASLR randomization range below the -2G memory limit.
> >
> > > What testing did this get?
> >
> > Tested boot, hibernation and performance on qemu and dedicated machine.

> Well, this is suspend, not hibernation code.

> So "sudo pm-suspend" or "echo mem > /sys/power/state" would be good
> way to test this.

Thanks, it worked. I added this to the testsuite I use for KASLR.


> Thanks,
>  Pavel

> > > > diff --git a/arch/x86/kernel/acpi/wakeup_64.S
> > b/arch/x86/kernel/acpi/wakeup_64.S
> > > > index 50b8ed0317a3..472659c0f811 100644
> > > > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > > > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > > > @@ -14,7 +14,7 @@
> > > >* Hooray, we are in Long 64-bit mode (but still running in
low
> > memory)
> > > >*/
> > > >  ENTRY(wakeup_long64)
> > > > - movqsaved_magic, %rax
> > > > + movqsaved_magic(%rip), %rax
> > > >   movq$0x123456789abcdef0, %rdx
> > > >   cmpq%rdx, %rax
> > > >   jne bogus_64_magic
> >
> > > Because, as comment says, this is rather tricky code.
> >
> > I agree, I think maintainers feedback is very important for this
patchset.


> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 21/27] x86/ftrace: Adapt function tracing for PIE support

2018-05-24 Thread Thomas Garnier
On Thu, May 24, 2018 at 1:16 PM Steven Rostedt <rost...@goodmis.org> wrote:

> On Thu, 24 May 2018 13:40:24 +0200
> Petr Mladek <pmla...@suse.com> wrote:

> > On Wed 2018-05-23 12:54:15, Thomas Garnier wrote:
> > > When using -fPIE/PIC with function tracing, the compiler generates a
> > > call through the GOT (call *__fentry__@GOTPCREL). This instruction
> > > takes 6 bytes instead of 5 on the usual relative call.
> > >
> > > If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte
nop
> > > so ftrace can handle the previous 5-bytes as before.
> > >
> > > Position Independent Executable (PIE) support will allow to extended
the
> > > KASLR randomization range below the -2G memory limit.
> > >
> > > Signed-off-by: Thomas Garnier <thgar...@google.com>
> > > ---
> > >  arch/x86/include/asm/ftrace.h   |  6 +++--
> > >  arch/x86/include/asm/sections.h |  4 
> > >  arch/x86/kernel/ftrace.c| 42
+++--
> > >  3 files changed, 48 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/ftrace.h
b/arch/x86/include/asm/ftrace.h
> > > index c18ed65287d5..8f2decce38d8 100644
> > > --- a/arch/x86/include/asm/ftrace.h
> > > +++ b/arch/x86/include/asm/ftrace.h
> > > @@ -25,9 +25,11 @@ extern void __fentry__(void);
> > >  static inline unsigned long ftrace_call_adjust(unsigned long addr)
> > >  {
> > > /*
> > > -* addr is the address of the mcount call instruction.
> > > -* recordmcount does the necessary offset calculation.
> > > +* addr is the address of the mcount call instruction. PIE has
always a
> > > +* byte added to the start of the function.
> > >  */
> > > +   if (IS_ENABLED(CONFIG_X86_PIE))
> > > +   addr -= 1;
> >
> > This seems to modify the address even for modules that are _not_
compiled with
> > PIE, see below.

> Can one load a module not compiled for PIE in a kernel with PIE?

> >
> > > return addr;
> > >  }
> > >
> > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > > index 01ebcb6f263e..73b3c30cb7a3 100644
> > > --- a/arch/x86/kernel/ftrace.c
> > > +++ b/arch/x86/kernel/ftrace.c
> > > @@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip,
unsigned const char *old_code,
> > > return 0;
> > >  }
> > >
> > > +/* Bytes before call GOT offset */
> > > +const unsigned char got_call_preinsn[] = { 0xff, 0x15 };
> > > +
> > > +static int
> > > +ftrace_modify_initial_code(unsigned long ip, unsigned const char
*old_code,
> > > +  unsigned const char *new_code)
> > > +{
> > > +   unsigned char replaced[MCOUNT_INSN_SIZE + 1];
> > > +
> > > +   ftrace_expected = old_code;
> > > +
> > > +   /*
> > > +* If PIE is not enabled or no GOT call was found, default to the
> > > +* original approach to code modification.
> > > +*/
> > > +   if (!IS_ENABLED(CONFIG_X86_PIE) ||
> > > +   probe_kernel_read(replaced, (void *)ip, sizeof(replaced)) ||
> > > +   memcmp(replaced, got_call_preinsn, sizeof(got_call_preinsn)))
> > > +   return ftrace_modify_code_direct(ip, old_code, new_code);
> >
> > And this looks like an attempt to handle modules compiled without
> > PIE. Does it works with the right ip in that case?

> I'm guessing the || is for the "or no GOT call was found", but it
> doesn't explain why no GOT would be found.

Yes, maybe I could have made it work by using text_ip_addr earlier.


> >
> > I wonder if a better solution would be to update
> > scripts/recordmcount.c to store the incremented location into the
module.

I will look into it.


> If recordmcount.c can handle this, then I think that's the preferred
> approach. Thanks!

> -- Steve

> >
> > IMPORTANT: I have only vague picture about how this all works. It is
> > possible that I am completely wrong. The code might be correct,
> > especially if you tested this situation.
> >
> > Best Regards,
> > Petr



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 11/27] x86/power/64: Adapt assembly for PIE support

2018-05-24 Thread Thomas Garnier
On Thu, May 24, 2018 at 4:04 AM Pavel Machek <pa...@ucw.cz> wrote:

> On Wed 2018-05-23 12:54:05, Thomas Garnier wrote:
> > Change the assembly code to use only relative references of symbols for
the
> > kernel to be PIE compatible.
> >
> > Position Independent Executable (PIE) support will allow to extended the
> > KASLR randomization range below the -2G memory limit.
> >
> > Signed-off-by: Thomas Garnier <thgar...@google.com>

> Again, was this tested?

Hibernation was tested as much as I can with qemu and my dedicated machine.
Any specific test you think I should use?


> > diff --git a/arch/x86/power/hibernate_asm_64.S
b/arch/x86/power/hibernate_asm_64.S
> > index ce8da3a0412c..6fdd7bbc3c33 100644
> > --- a/arch/x86/power/hibernate_asm_64.S
> > +++ b/arch/x86/power/hibernate_asm_64.S
> > @@ -24,7 +24,7 @@
> >  #include 
> >
> >  ENTRY(swsusp_arch_suspend)
> > - movq$saved_context, %rax
> > + leaqsaved_context(%rip), %rax
> >   movq%rsp, pt_regs_sp(%rax)
> >   movq%rbp, pt_regs_bp(%rax)
> >   movq%rsi, pt_regs_si(%rax)
> > @@ -115,7 +115,7 @@ ENTRY(restore_registers)
> >   movq%rax, %cr4;  # turn PGE back on
> >
> >   /* We don't restore %rax, it must be 0 anyway */
> > - movq$saved_context, %rax
> > + leaqsaved_context(%rip), %rax
> >   movqpt_regs_sp(%rax), %rsp
> >   movqpt_regs_bp(%rax), %rbp
> >   movqpt_regs_si(%rax), %rsi

> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/27] x86/acpi: Adapt assembly for PIE support

2018-05-24 Thread Thomas Garnier
On Thu, May 24, 2018 at 4:03 AM Pavel Machek <pa...@ucw.cz> wrote:

> On Wed 2018-05-23 12:54:03, Thomas Garnier wrote:
> > Change the assembly code to use only relative references of symbols for
the
> > kernel to be PIE compatible.
> >
> > Position Independent Executable (PIE) support will allow to extended the
> > KASLR randomization range below the -2G memory limit.

> What testing did this get?

Tested boot, hibernation and performance on qemu and dedicated machine.


> > diff --git a/arch/x86/kernel/acpi/wakeup_64.S
b/arch/x86/kernel/acpi/wakeup_64.S
> > index 50b8ed0317a3..472659c0f811 100644
> > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > @@ -14,7 +14,7 @@
> >* Hooray, we are in Long 64-bit mode (but still running in low
memory)
> >*/
> >  ENTRY(wakeup_long64)
> > - movqsaved_magic, %rax
> > + movqsaved_magic(%rip), %rax
> >   movq$0x123456789abcdef0, %rdx
> >   cmpq%rdx, %rax
> >   jne bogus_64_magic

> Because, as comment says, this is rather tricky code.

I agree, I think maintainers feedback is very important for this patchset.


Pavel

> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 23/27] x86/modules: Adapt module loading for PIE support

2018-05-23 Thread Thomas Garnier
On Wed, May 23, 2018 at 2:27 PM Randy Dunlap <rdun...@infradead.org> wrote:

> Hi,

> (for several patches in this series:)
> The commit message is confusing.  See below.

Thanks for the edits, I will change the different commit messages.



> On 05/23/2018 12:54 PM, Thomas Garnier wrote:
> > Adapt module loading to support PIE relocations. Generate dynamic GOT if
> > a symbol requires it but no entry exist in the kernel GOT.

>  exists

> >
> > Position Independent Executable (PIE) support will allow to extended the

>  will allow us to extend
the

> > KASLR randomization range below the -2G memory limit.

> Does that say "below th negative 2G memory limit"?
> I don't get it.

Yes, below 0x8000 basically. I think I will just say that.



> >
> > Signed-off-by: Thomas Garnier <thgar...@google.com>
> > ---
> >  arch/x86/Makefile   |   4 +
> >  arch/x86/include/asm/module.h   |  11 ++
> >  arch/x86/include/asm/sections.h |   4 +
> >  arch/x86/kernel/module.c| 181 +++-
> >  arch/x86/kernel/module.lds  |   3 +
> >  5 files changed, 198 insertions(+), 5 deletions(-)
> >  create mode 100644 arch/x86/kernel/module.lds


> Thanks,
> --
> ~Randy



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 22/27] x86/modules: Add option to start module section after kernel

2018-05-23 Thread Thomas Garnier
Add an option so the module section is just after the mapped kernel. It
will ensure position independent modules are always at the right
distance from the kernel and do not require mcmodule=large. It also
optimize the available size for modules by getting rid of the empty
space on kernel randomization range.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 Documentation/x86/x86_64/mm.txt | 3 +++
 arch/x86/Kconfig| 4 
 arch/x86/include/asm/pgtable_64_types.h | 6 ++
 arch/x86/kernel/head64.c| 5 -
 arch/x86/mm/dump_pagetables.c   | 3 ++-
 5 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 600bc2afa27d..e3810a1db74b 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -79,3 +79,6 @@ Their order is preserved but their base will be offset early 
at boot time.
 Be very careful vs. KASLR when changing anything here. The KASLR address
 range must not overlap with anything except the KASAN shadow area, which is
 correct as KASAN disables KASLR.
+
+If CONFIG_DYNAMIC_MODULE_BASE is enabled, the module section follows the end of
+the mapped kernel.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0fc2e981458d..28eb2b3757bf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2199,6 +2199,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
   If unsure, leave at the default value.
 
+# Module section starts just after the end of the kernel module
+config DYNAMIC_MODULE_BASE
+   bool
+
 config X86_GLOBAL_STACKPROTECTOR
bool "Stack cookie using a global variable"
depends on CC_STACKPROTECTOR_AUTO
diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index adb47552e6bb..3ab25b908879 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -7,6 +7,7 @@
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /*
  * These are used to make use of C type-checking..
@@ -126,7 +127,12 @@ extern unsigned int ptrs_per_p4d;
 
 #define VMALLOC_END(VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
 
+#ifdef CONFIG_DYNAMIC_MODULE_BASE
+#define MODULES_VADDR  ALIGN(((unsigned long)_end + PAGE_SIZE), 
PMD_SIZE)
+#else
 #define MODULES_VADDR  (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#endif
+
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END_AC(0xff00, UL)
 #define MODULES_LEN(MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fa661fb97127..3a1ce822e1c0 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -394,12 +394,15 @@ asmlinkage __visible void __init x86_64_start_kernel(char 
* real_mode_data)
 * Build-time sanity checks on the kernel image and module
 * area mappings. (these are purely build-time and produce no code)
 */
+#ifndef CONFIG_DYNAMIC_MODULE_BASE
BUILD_BUG_ON(MODULES_VADDR < __START_KERNEL_map);
BUILD_BUG_ON(MODULES_VADDR - __START_KERNEL_map < KERNEL_IMAGE_SIZE);
-   BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
+   BUILD_BUG_ON(!IS_ENABLED(CONFIG_RANDOMIZE_BASE_LARGE) &&
+MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+#endif
MAYBE_BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
(__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index cc7ff5957194..dca4098ce4fd 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -105,7 +105,7 @@ static struct addr_marker address_markers[] = {
[EFI_END_NR]= { EFI_VA_END, "EFI Runtime Services" 
},
 #endif
[HIGH_KERNEL_NR]= { __START_KERNEL_map, "High Kernel Mapping" },
-   [MODULES_VADDR_NR]  = { MODULES_VADDR,  "Modules" },
+   [MODULES_VADDR_NR]  = { 0/*MODULES_VADDR*/, "Modules" },
[MODULES_END_NR]= { MODULES_END,"End Modules" },
[FIXADDR_START_NR]  = { FIXADDR_START,  "Fixmap Area" },
[END_OF_SPACE_NR]   = { -1, NULL }
@@ -600,6 +600,7 @@ static int __init pt_dump_init(void)
address_markers[KASAN_SHADOW_START_NR].start_address = 
KASAN_SHADOW_START;
address_markers[KASAN_SHADOW_END_NR].start_address = KASAN_SHADOW_END;
 #endif
+   address_markers[MODULES_VADDR_NR].start_address = MODULES_

[Xen-devel] [PATCH v3 26/27] x86/relocs: Add option to generate 64-bit relocations

2018-05-23 Thread Thomas Garnier
The x86 relocation tool generates a list of 32-bit signed integers. There
was no need to use 64-bit integers because all addresses where above the 2G
top of the memory.

This change add a large-reloc option to generate 64-bit unsigned integers.
It can be used when the kernel plan to go below the top 2G and 32-bit
integers are not enough.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/tools/relocs.c| 60 +++---
 arch/x86/tools/relocs.h|  4 +--
 arch/x86/tools/relocs_common.c | 15 ++---
 3 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 29283ad3950f..a29eaac6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -13,8 +13,14 @@
 
 static Elf_Ehdr ehdr;
 
+#if ELF_BITS == 64
+typedef uint64_t rel_off_t;
+#else
+typedef uint32_t rel_off_t;
+#endif
+
 struct relocs {
-   uint32_t*offset;
+   rel_off_t   *offset;
unsigned long   count;
unsigned long   size;
 };
@@ -685,7 +691,7 @@ static void print_absolute_relocs(void)
printf("\n");
 }
 
-static void add_reloc(struct relocs *r, uint32_t offset)
+static void add_reloc(struct relocs *r, rel_off_t offset)
 {
if (r->count == r->size) {
unsigned long newsize = r->size + 5;
@@ -1061,26 +1067,48 @@ static void sort_relocs(struct relocs *r)
qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
 }
 
-static int write32(uint32_t v, FILE *f)
+static int write32(rel_off_t rel, FILE *f)
 {
-   unsigned char buf[4];
+   unsigned char buf[sizeof(uint32_t)];
+   uint32_t v = (uint32_t)rel;
 
put_unaligned_le32(v, buf);
-   return fwrite(buf, 1, 4, f) == 4 ? 0 : -1;
+   return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
 }
 
-static int write32_as_text(uint32_t v, FILE *f)
+static int write32_as_text(rel_off_t rel, FILE *f)
 {
+   uint32_t v = (uint32_t)rel;
return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1;
 }
 
-static void emit_relocs(int as_text, int use_real_mode)
+static int write64(rel_off_t rel, FILE *f)
+{
+   unsigned char buf[sizeof(uint64_t)];
+   uint64_t v = (uint64_t)rel;
+
+   put_unaligned_le64(v, buf);
+   return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
+}
+
+static int write64_as_text(rel_off_t rel, FILE *f)
+{
+   uint64_t v = (uint64_t)rel;
+   return fprintf(f, "\t.quad 0x%016"PRIx64"\n", v) > 0 ? 0 : -1;
+}
+
+static void emit_relocs(int as_text, int use_real_mode, int use_large_reloc)
 {
int i;
-   int (*write_reloc)(uint32_t, FILE *) = write32;
+   int (*write_reloc)(rel_off_t, FILE *);
int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym,
const char *symname);
 
+   if (use_large_reloc)
+   write_reloc = write64;
+   else
+   write_reloc = write32;
+
 #if ELF_BITS == 64
if (!use_real_mode)
do_reloc = do_reloc64;
@@ -1091,6 +1119,9 @@ static void emit_relocs(int as_text, int use_real_mode)
do_reloc = do_reloc32;
else
do_reloc = do_reloc_real;
+
+   /* Large relocations only for 64-bit */
+   use_large_reloc = 0;
 #endif
 
/* Collect up the relocations */
@@ -1114,8 +1145,13 @@ static void emit_relocs(int as_text, int use_real_mode)
 * gas will like.
 */
printf(".section \".data.reloc\",\"a\"\n");
-   printf(".balign 4\n");
-   write_reloc = write32_as_text;
+   if (use_large_reloc) {
+   printf(".balign 8\n");
+   write_reloc = write64_as_text;
+   } else {
+   printf(".balign 4\n");
+   write_reloc = write32_as_text;
+   }
}
 
if (use_real_mode) {
@@ -1183,7 +1219,7 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 int show_absolute_syms, int show_absolute_relocs,
-int show_reloc_info)
+int show_reloc_info, int use_large_reloc)
 {
regex_init(use_real_mode);
read_ehdr(fp);
@@ -1206,5 +1242,5 @@ void process(FILE *fp, int use_real_mode, int as_text,
print_reloc_info();
return;
}
-   emit_relocs(as_text, use_real_mode);
+   emit_relocs(as_text, use_real_mode, use_large_reloc);
 }
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 43c83c0fd22c..3d401da59df7 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -31,8 +31,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
int 

[Xen-devel] [PATCH v3 17/27] x86/relocs: Handle PIE relocations

2018-05-23 Thread Thomas Garnier
Change the relocation tool to correctly handle relocations generated by
-fPIE option:

 - Add relocation for each entry of the .got section given the linker does not
   generate R_X86_64_GLOB_DAT on a simple link.
 - Ignore R_X86_64_GOTPCREL.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/tools/relocs.c | 93 -
 1 file changed, 92 insertions(+), 1 deletion(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 220e97841e49..a35cc337f883 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -32,6 +32,7 @@ struct section {
Elf_Sym*symtab;
Elf_Rel*reltab;
char   *strtab;
+   Elf_Addr   *got;
 };
 static struct section *secs;
 
@@ -293,6 +294,35 @@ static Elf_Sym *sym_lookup(const char *symname)
return 0;
 }
 
+static Elf_Sym *sym_lookup_addr(Elf_Addr addr, const char **name)
+{
+   int i;
+   for (i = 0; i < ehdr.e_shnum; i++) {
+   struct section *sec = [i];
+   long nsyms;
+   Elf_Sym *symtab;
+   Elf_Sym *sym;
+
+   if (sec->shdr.sh_type != SHT_SYMTAB)
+   continue;
+
+   nsyms = sec->shdr.sh_size/sizeof(Elf_Sym);
+   symtab = sec->symtab;
+
+   for (sym = symtab; --nsyms >= 0; sym++) {
+   if (sym->st_value == addr) {
+   if (name) {
+   *name = sym_name(sec->link->strtab,
+sym);
+   }
+   return sym;
+   }
+   }
+   }
+   return 0;
+}
+
+
 #if BYTE_ORDER == LITTLE_ENDIAN
 #define le16_to_cpu(val) (val)
 #define le32_to_cpu(val) (val)
@@ -513,6 +543,33 @@ static void read_relocs(FILE *fp)
}
 }
 
+static void read_got(FILE *fp)
+{
+   int i;
+   for (i = 0; i < ehdr.e_shnum; i++) {
+   struct section *sec = [i];
+   sec->got = NULL;
+   if (sec->shdr.sh_type != SHT_PROGBITS ||
+   strcmp(sec_name(i), ".got")) {
+   continue;
+   }
+   sec->got = malloc(sec->shdr.sh_size);
+   if (!sec->got) {
+   die("malloc of %d bytes for got failed\n",
+   sec->shdr.sh_size);
+   }
+   if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+   die("Seek to %d failed: %s\n",
+   sec->shdr.sh_offset, strerror(errno));
+   }
+   if (fread(sec->got, 1, sec->shdr.sh_size, fp)
+   != sec->shdr.sh_size) {
+   die("Cannot read got: %s\n",
+   strerror(errno));
+   }
+   }
+}
+
 
 static void print_absolute_symbols(void)
 {
@@ -643,6 +700,32 @@ static void add_reloc(struct relocs *r, uint32_t offset)
r->offset[r->count++] = offset;
 }
 
+/*
+ * The linker does not generate relocations for the GOT for the kernel.
+ * If a GOT is found, simulate the relocations that should have been included.
+ */
+static void walk_got_table(int (*process)(struct section *sec, Elf_Rel *rel,
+ Elf_Sym *sym, const char *symname),
+  struct section *sec)
+{
+   int i;
+   Elf_Addr entry;
+   Elf_Sym *sym;
+   const char *symname;
+   Elf_Rel rel;
+
+   for (i = 0; i < sec->shdr.sh_size/sizeof(Elf_Addr); i++) {
+   entry = sec->got[i];
+   sym = sym_lookup_addr(entry, );
+   if (!sym)
+   die("Could not found got symbol for entry %d\n", i);
+   rel.r_offset = sec->shdr.sh_addr + i * sizeof(Elf_Addr);
+   rel.r_info = ELF_BITS == 64 ? R_X86_64_GLOB_DAT
+: R_386_GLOB_DAT;
+   process(sec, , sym, symname);
+   }
+}
+
 static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
Elf_Sym *sym, const char *symname))
 {
@@ -656,6 +739,8 @@ static void walk_relocs(int (*process)(struct section *sec, 
Elf_Rel *rel,
struct section *sec = [i];
 
if (sec->shdr.sh_type != SHT_REL_TYPE) {
+   if (sec->got)
+   walk_got_table(process, sec);
continue;
}
sec_symtab  = sec->link;
@@ -765,6 +850,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, 
ElfW(Sym) *sym,
offset += per_cpu_load_addr;
 
switch (r_type) {
+   case R_X86_64_GOTPCREL:
case R_X86

[Xen-devel] [PATCH v3 25/27] x86/pie: Add option to build the kernel as PIE

2018-05-23 Thread Thomas Garnier
Add the CONFIG_X86_PIE option which builds the kernel as a Position
Independent Executable (PIE). The kernel is currently build with the
mcmodel=kernel option which forces it to stay on the top 2G of the
virtual address space. With PIE, the kernel will be able to move below
the current limit.

The --emit-relocs linker option was kept instead of using -pie to limit
the impact on mapped sections. Any incompatible relocation will be
catch by the arch/x86/tools/relocs binary at compile time.

If segment based stack cookies are enabled, try to use the compiler
option to select the segment register. If not available, automatically
enabled global stack cookie in auto mode. Otherwise, recommend
compiler update or global stack cookie option.

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.18%
 - PIE enabled: -1.977% (less relocations)
 .text section:
 - PIE disabled: same
 - PIE enabled: same

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: +0.21%
 - PIE enabled: +10%
 .text section:
 - PIE disabled: same
 - PIE enabled: +0.001%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg -/+ 0.5% on latest test).
 - PIE enabled: between -1% to +1% in average (default and Ubuntu config).

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.5%)
 - PIE enabled: average -0.5% to +0.5%
 System Time:
 - PIE disabled: no significant change (avg -0.1%)
 - PIE enabled: average -0.4% to +0.4%.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Signed-off-by: Thomas Garnier <thgar...@google.com>

merge pie
---
 arch/x86/Kconfig  |  8 
 arch/x86/Makefile | 45 -
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 28eb2b3757bf..26d5d4942777 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2215,6 +2215,14 @@ config X86_GLOBAL_STACKPROTECTOR
 
   If unsure, say N
 
+config X86_PIE
+   bool
+   depends on X86_64
+   select DEFAULT_HIDDEN
+   select WEAK_PROVIDE_HIDDEN
+   select DYNAMIC_MODULE_BASE
+   select MODULE_REL_CRCS if MODVERSIONS
+
 config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 20bb6cbd8938..c92bcca4400c 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -60,6 +60,8 @@ endif
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
 KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 
+stackglobal := $(call cc-option-yn, -mstack-protector-guard=global)
+
 ifeq ($(CONFIG_X86_32),y)
 BITS := 32
 UTS_MACHINE := i386
@@ -135,7 +137,48 @@ else
 
 KBUILD_CFLAGS += -mno-red-zone
 ifdef CONFIG_X86_PIE
+KBUILD_CFLAGS += -fPIE
 KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
+
+# Relax relocation in both CFLAGS and LDFLAGS to support older 
compilers
+KBUILD_CFLAGS += $(call cc-option,-Wa$(comma)-mrelax-relocations=no)
+LDFLAGS_vmlinux += $(call ld-option,--no-relax)
+KBUILD_LDFLAGS_MODULE += $(call ld-option,--no-relax)
+
+# Stack validation is not yet support due to self-referenced switches
+ifdef CONFIG_STACK_VALIDATION
+$(warning CONFIG_STACK_VALIDATION is not yet supported for x86_64 pie \
+   build.)
+SKIP_STACK_VALIDATION := 1
+export SKIP_STACK_VALIDATION
+endif
+
+ifndef CONFIG_CC_STACKPROTECTOR_NONE
+ifndef CONFIG_X86_GLOBAL_STACKPROTECTOR
+stackseg-flag := -mstack-protector-guard-reg=%gs
+ifeq ($(call cc-option-yn,$(stackseg-flag)),n)
+# Try to enable global stack cookie if possible
+ifeq ($(stackglobal), y)
+$(warning Cannot use CONFIG_CC_STACKPROTECTOR_* while \
+building a position independent kernel. \
+Default to global stack protector \
+(CONFIG_X86_GLOBAL_STACKPROTECTOR).)
+CONFIG_X86_GLOBAL_STACKPROTECTOR := y
+KBUILD_CFLAGS += -DCONFIG_X86_GLOBAL_STACKPROTECTOR
+KBUILD_AFLAGS += -DCONFIG_X86_GLOBAL_STACKPROTECTOR
+else
+$(error echo Cannot use \
+CONFIG_CC_STACKPROTECTOR_(REGULAR|STRONG|AUTO) 
\
+while building a position independent binary. \
+Update your

[Xen-devel] [PATCH v3 15/27] compiler: Option to default to hidden symbols

2018-05-23 Thread Thomas Garnier
Provide an option to default visibility to hidden except for key
symbols. This option is disabled by default and will be used by x86_64
PIE support to remove errors between compilation units.

The default visibility is also enabled for external symbols that are
compared as they maybe equals (start/end of sections). In this case,
older versions of GCC will remove the comparison if the symbols are
hidden. This issue exists at least on gcc 4.9 and before.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/boot/boot.h |  2 +-
 arch/x86/include/asm/setup.h |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  4 ++--
 drivers/base/firmware_loader/main.c  |  4 ++--
 include/asm-generic/sections.h   |  6 ++
 include/linux/compiler.h |  7 +++
 init/Kconfig |  7 +++
 kernel/kallsyms.c| 16 
 kernel/trace/trace.h |  4 ++--
 lib/dynamic_debug.c  |  4 ++--
 10 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef5a9cc66fb8..d726c35bdd96 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -193,7 +193,7 @@ static inline bool memcmp_gs(const void *s1, addr_t s2, 
size_t len)
 }
 
 /* Heap -- available for dynamic lists. */
-extern char _end[];
+extern char _end[] __default_visibility;
 extern char *HEAP;
 extern char *heap_end;
 #define RESET_HEAP() ((void *)( HEAP = _end ))
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ae13bc974416..083a6e99b884 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -68,7 +68,7 @@ static inline void x86_ce4100_early_setup(void) { }
  * This is set up by the setup-routine at boot-time
  */
 extern struct boot_params boot_params;
-extern char _text[];
+extern char _text[] __default_visibility;
 
 static inline bool kaslr_enabled(void)
 {
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index 77e201301528..6a4f5d9d7eb6 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -149,8 +149,8 @@ static bool __init check_loader_disabled_bsp(void)
return *res;
 }
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 bool get_builtin_firmware(struct cpio_data *cd, const char *name)
 {
diff --git a/drivers/base/firmware_loader/main.c 
b/drivers/base/firmware_loader/main.c
index 0943e7065e0e..2ffd019af2d4 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -94,8 +94,8 @@ static struct firmware_cache fw_cache;
 
 #ifdef CONFIG_FW_LOADER
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 static void fw_copy_to_prealloc_buf(struct firmware *fw,
void *buf, size_t size)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 849cd8eb5ca0..0a0e23405ddd 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -32,6 +32,9 @@
  * __softirqentry_text_start, __softirqentry_text_end
  * __start_opd, __end_opd
  */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(default)
+#endif
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
 extern char __bss_start[], __bss_stop[];
@@ -49,6 +52,9 @@ extern char __start_once[], __end_once[];
 
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility pop
+#endif
 
 /* Start and end of .opd section - used for function descriptors. */
 extern char __start_opd[], __end_opd[];
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 341b6cf8c029..81a9986cad78 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -278,6 +278,13 @@ unsigned long read_word_at_a_time(const void *addr)
__u.__val;  \
 })
 
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(hidden)
+#define __default_visibility  __attribute__((visibility ("default")))
+#else
+#define __default_visibility
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif /* __ASSEMBLY__ */
diff --git a/init/Kconfig b/init/Kconfig
index 8915a3ce5f0c..0fc3a58d9f2f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1694,6 +1694,13 @@ config PROFILING
 config TRACEPOINTS
bool
 
+#
+# Default to hidden visibility for all symbols.
+# Useful for Position Independent Code to reduce global references.
+#
+config DEFAULT_HID

[Xen-devel] [PATCH v3 21/27] x86/ftrace: Adapt function tracing for PIE support

2018-05-23 Thread Thomas Garnier
When using -fPIE/PIC with function tracing, the compiler generates a
call through the GOT (call *__fentry__@GOTPCREL). This instruction
takes 6 bytes instead of 5 on the usual relative call.

If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte nop
so ftrace can handle the previous 5-bytes as before.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/ftrace.h   |  6 +++--
 arch/x86/include/asm/sections.h |  4 
 arch/x86/kernel/ftrace.c| 42 +++--
 3 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index c18ed65287d5..8f2decce38d8 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -25,9 +25,11 @@ extern void __fentry__(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
/*
-* addr is the address of the mcount call instruction.
-* recordmcount does the necessary offset calculation.
+* addr is the address of the mcount call instruction. PIE has always a
+* byte added to the start of the function.
 */
+   if (IS_ENABLED(CONFIG_X86_PIE))
+   addr -= 1;
return addr;
 }
 
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 5c019d23d06b..da3d98bb2bcb 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -13,4 +13,8 @@ extern char __end_rodata_hpage_align[];
 extern char __entry_trampoline_start[], __entry_trampoline_end[];
 #endif
 
+#if defined(CONFIG_X86_PIE)
+extern char __start_got[], __end_got[];
+#endif
+
 #endif /* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 01ebcb6f263e..73b3c30cb7a3 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -102,7 +102,7 @@ static const unsigned char *ftrace_nop_replace(void)
 
 static int
 ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-  unsigned const char *new_code)
+ unsigned const char *new_code)
 {
unsigned char replaced[MCOUNT_INSN_SIZE];
 
@@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip, unsigned const 
char *old_code,
return 0;
 }
 
+/* Bytes before call GOT offset */
+const unsigned char got_call_preinsn[] = { 0xff, 0x15 };
+
+static int
+ftrace_modify_initial_code(unsigned long ip, unsigned const char *old_code,
+  unsigned const char *new_code)
+{
+   unsigned char replaced[MCOUNT_INSN_SIZE + 1];
+
+   ftrace_expected = old_code;
+
+   /*
+* If PIE is not enabled or no GOT call was found, default to the
+* original approach to code modification.
+*/
+   if (!IS_ENABLED(CONFIG_X86_PIE) ||
+   probe_kernel_read(replaced, (void *)ip, sizeof(replaced)) ||
+   memcmp(replaced, got_call_preinsn, sizeof(got_call_preinsn)))
+   return ftrace_modify_code_direct(ip, old_code, new_code);
+
+   /*
+* Build a nop slide with a 5-byte nop and 1-byte nop to keep the ftrace
+* hooking algorithm working with the expected 5 bytes instruction.
+*/
+   memcpy(replaced, new_code, MCOUNT_INSN_SIZE);
+   replaced[MCOUNT_INSN_SIZE] = ideal_nops[1][0];
+
+   ip = text_ip_addr(ip);
+
+   if (probe_kernel_write((void *)ip, replaced, sizeof(replaced)))
+   return -EPERM;
+
+   sync_core();
+
+   return 0;
+
+}
+
 int ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
 {
@@ -153,7 +191,7 @@ int ftrace_make_nop(struct module *mod,
 * just modify the code directly.
 */
if (addr == MCOUNT_ADDR)
-   return ftrace_modify_code_direct(rec->ip, old, new);
+   return ftrace_modify_initial_code(rec->ip, old, new);
 
ftrace_expected = NULL;
 
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 18/27] xen: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use the new _ASM_MOVABS macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/tools/relocs.c | 16 +++-
 arch/x86/xen/xen-head.S | 11 ++-
 arch/x86/xen/xen-pvh.S  | 13 +
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index a35cc337f883..29283ad3950f 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -832,6 +832,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char 
*symname)
strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+   ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+   return sym && (offset >= sym->st_value) &&
+   (offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
  const char *symname)
@@ -895,8 +905,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, 
ElfW(Sym) *sym,
 * the relocations are processed.
 * Make sure that the offset will fit.
 */
-   if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+   if (r_type != R_X86_64_64 &&
+   (int32_t)offset != (int64_t)offset) {
+   if (is_in_xenpvh_assembly(offset))
+   break;
die("Relocation offset doesn't fit in 32 bits\n");
+   }
 
if (r_type == R_X86_64_64)
add_reloc(, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 5077ead5e59c..4418ff0a1d96 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -28,14 +28,15 @@ ENTRY(startup_xen)
 
/* Clear .bss */
xor %eax,%eax
-   mov $__bss_start, %_ASM_DI
-   mov $__bss_stop, %_ASM_CX
+   _ASM_MOVABS $__bss_start, %_ASM_DI
+   _ASM_MOVABS $__bss_stop, %_ASM_CX
sub %_ASM_DI, %_ASM_CX
shr $__ASM_SEL(2, 3), %_ASM_CX
rep __ASM_SIZE(stos)
 
-   mov %_ASM_SI, xen_start_info
-   mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+   _ASM_MOVABS $xen_start_info, %_ASM_AX
+   _ASM_MOV %_ASM_SI, (%_ASM_AX)
+   _ASM_MOVABS $init_thread_union+THREAD_SIZE, %_ASM_SP
 
 #ifdef CONFIG_X86_64
/* Set up %gs.
@@ -46,7 +47,7 @@ ENTRY(startup_xen)
 * init data section till per cpu areas are set up.
 */
movl$MSR_GS_BASE,%ecx
-   movq$INIT_PER_CPU_VAR(irq_stack_union),%rax
+   movabsq $INIT_PER_CPU_VAR(irq_stack_union),%rax
cdq
wrmsr
 #endif
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index e1a5fbeae08d..43e234c7c2de 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -101,8 +101,8 @@ ENTRY(pvh_start_xen)
call xen_prepare_pvh
 
/* startup_64 expects boot_params in %rsi. */
-   mov $_pa(pvh_bootparams), %rsi
-   mov $_pa(startup_64), %rax
+   movabs $_pa(pvh_bootparams), %rsi
+   movabs $_pa(startup_64), %rax
jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -137,10 +137,15 @@ END(pvh_start_xen)
 
.section ".init.data","aw"
.balign 8
+   /*
+* Use a quad for _pa(gdt_start) because PIE does not understand a
+* long is enough. The resulting value will still be in the lower long
+* part.
+*/
 gdt:
.word gdt_end - gdt_start
-   .long _pa(gdt_start)
-   .word 0
+   .quad _pa(gdt_start)
+   .balign 8
 gdt_start:
.quad 0x/* NULL descriptor */
.quad 0x/* reserved */
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 23/27] x86/modules: Adapt module loading for PIE support

2018-05-23 Thread Thomas Garnier
Adapt module loading to support PIE relocations. Generate dynamic GOT if
a symbol requires it but no entry exist in the kernel GOT.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/Makefile   |   4 +
 arch/x86/include/asm/module.h   |  11 ++
 arch/x86/include/asm/sections.h |   4 +
 arch/x86/kernel/module.c| 181 +++-
 arch/x86/kernel/module.lds  |   3 +
 5 files changed, 198 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kernel/module.lds

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 277ffc57ae13..20bb6cbd8938 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -134,7 +134,11 @@ else
 KBUILD_CFLAGS += $(cflags-y)
 
 KBUILD_CFLAGS += -mno-red-zone
+ifdef CONFIG_X86_PIE
+KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
+else
 KBUILD_CFLAGS += -mcmodel=kernel
+endif
 
 # -funit-at-a-time shrinks the kernel .text considerably
 # unfortunately it makes reading oopses harder.
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index 7948a17febb4..68ff05e14288 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -5,12 +5,23 @@
 #include 
 #include 
 
+#ifdef CONFIG_X86_PIE
+struct mod_got_sec {
+   struct elf64_shdr   *got;
+   int got_num_entries;
+   int got_max_entries;
+};
+#endif
+
 struct mod_arch_specific {
 #ifdef CONFIG_UNWINDER_ORC
unsigned int num_orcs;
int *orc_unwind_ip;
struct orc_entry *orc_unwind;
 #endif
+#ifdef CONFIG_X86_PIE
+   struct mod_got_sec  core;
+#endif
 };
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index da3d98bb2bcb..89b3a95c8d11 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -17,4 +17,8 @@ extern char __entry_trampoline_start[], 
__entry_trampoline_end[];
 extern char __start_got[], __end_got[];
 #endif
 
+#if defined(CONFIG_X86_PIE)
+extern char __start_got[], __end_got[];
+#endif
+
 #endif /* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index f58336af095c..88895f3d474b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -77,6 +78,173 @@ static unsigned long int get_module_load_offset(void)
 }
 #endif
 
+#ifdef CONFIG_X86_PIE
+static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela)
+{
+   u64 *pos;
+
+   for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+   if (*pos == sym->st_value)
+   return (u64)pos + rela->r_addend;
+   }
+
+   return 0;
+}
+
+static u64 module_emit_got_entry(struct module *mod, void *loc,
+const Elf64_Rela *rela, Elf64_Sym *sym)
+{
+   struct mod_got_sec *gotsec = >arch.core;
+   u64 *got = (u64*)gotsec->got->sh_addr;
+   int i = gotsec->got_num_entries;
+   u64 ret;
+
+   /* Check if we can use the kernel GOT */
+   ret = find_got_kernel_entry(sym, rela);
+   if (ret)
+   return ret;
+
+   got[i] = sym->st_value;
+
+   /*
+* Check if the entry we just created is a duplicate. Given that the
+* relocations are sorted, this will be the last entry we allocated.
+* (if one exists).
+*/
+   if (i > 0 && got[i] == got[i - 2]) {
+   ret = (u64)[i - 1];
+   } else {
+   gotsec->got_num_entries++;
+   BUG_ON(gotsec->got_num_entries > gotsec->got_max_entries);
+   ret = (u64)[i];
+   }
+
+   return ret + rela->r_addend;
+}
+
+#define cmp_3way(a,b)  ((a) < (b) ? -1 : (a) > (b))
+
+static int cmp_rela(const void *a, const void *b)
+{
+   const Elf64_Rela *x = a, *y = b;
+   int i;
+
+   /* sort by type, symbol index and addend */
+   i = cmp_3way(ELF64_R_TYPE(x->r_info), ELF64_R_TYPE(y->r_info));
+   if (i == 0)
+   i = cmp_3way(ELF64_R_SYM(x->r_info), ELF64_R_SYM(y->r_info));
+   if (i == 0)
+   i = cmp_3way(x->r_addend, y->r_addend);
+   return i;
+}
+
+static bool duplicate_rel(const Elf64_Rela *rela, int num)
+{
+   /*
+* Entries are sorted by type, symbol index and addend. That means
+* that, if a duplicate entry exists, it must be in the preceding
+* slot.
+*/
+   return num > 0 && cmp_rela(rela + num, rela + num - 1) == 0;
+}
+
+static unsigned int count_gots(Elf64_Sym *syms, Elf64_Rela *rela, int num)
+{
+   unsigned int ret = 0;
+   Elf64_Sym *s;
+

[Xen-devel] [PATCH v3 27/27] x86/kaslr: Add option to extend KASLR range from 1GB to 3GB

2018-05-23 Thread Thomas Garnier
Add a new CONFIG_RANDOMIZE_BASE_LARGE option to benefit from PIE
support. It increases the KASLR range from 1GB to 3GB. The new range
stars at 0x just above the EFI memory region. This
option is off by default.

The boot code is adapted to create the appropriate page table spanning
three PUD pages.

The relocation table uses 64-bit integers generated with the updated
relocation tool with the large-reloc option.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/Kconfig | 21 +
 arch/x86/boot/compressed/Makefile|  5 +
 arch/x86/boot/compressed/misc.c  | 10 +-
 arch/x86/include/asm/page_64_types.h |  9 +
 arch/x86/kernel/head64.c | 15 ---
 arch/x86/kernel/head_64.S| 11 ++-
 6 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 26d5d4942777..3596a7a76ff0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2223,6 +2223,27 @@ config X86_PIE
select DYNAMIC_MODULE_BASE
select MODULE_REL_CRCS if MODVERSIONS
 
+config RANDOMIZE_BASE_LARGE
+   bool "Increase the randomization range of the kernel image"
+   depends on X86_64 && RANDOMIZE_BASE
+   select X86_PIE
+   select X86_MODULE_PLTS if MODULES
+   default n
+   ---help---
+ Build the kernel as a Position Independent Executable (PIE) and
+ increase the available randomization range from 1GB to 3GB.
+
+ This option impacts performance on kernel CPU intensive workloads up
+ to 10% due to PIE generated code. Impact on user-mode processes and
+ typical usage would be significantly less (0.50% when you build the
+ kernel).
+
+ The kernel and modules will generate slightly more assembly (1 to 2%
+ increase on the .text sections). The vmlinux binary will be
+ significantly smaller due to less relocations.
+
+ If unsure say N
+
 config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index fa42f895fdde..8497ebd5e078 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -116,7 +116,12 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 
 targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all 
vmlinux.relocs
 
+# Large randomization require bigger relocation table
+ifeq ($(CONFIG_RANDOMIZE_BASE_LARGE),y)
+CMD_RELOCS = arch/x86/tools/relocs --large-reloc
+else
 CMD_RELOCS = arch/x86/tools/relocs
+endif
 quiet_cmd_relocs = RELOCS  $@
   cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8dd1d5ccae58..28d17bd5bad8 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -171,10 +171,18 @@ void __puthex(unsigned long value)
 }
 
 #if CONFIG_X86_NEED_RELOCS
+
+/* Large randomization go lower than -2G and use large relocation table */
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+typedef long rel_t;
+#else
+typedef int rel_t;
+#endif
+
 static void handle_relocations(void *output, unsigned long output_len,
   unsigned long virt_addr)
 {
-   int *reloc;
+   rel_t *reloc;
unsigned long delta, map, ptr;
unsigned long min_addr = (unsigned long)output;
unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 2c5a966dc222..85ea681421d2 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -46,7 +46,11 @@
 #define __PAGE_OFFSET   __PAGE_OFFSET_BASE_L4
 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
 
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define __START_KERNEL_map _AC(0x, UL)
+#else
 #define __START_KERNEL_map _AC(0x8000, UL)
+#endif /* CONFIG_RANDOMIZE_BASE_LARGE */
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 
@@ -64,9 +68,14 @@
  * 512MiB by default, leaving 1.5GiB for modules once the page tables
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
+ * On PIE, we relocate the binary 2G lower so add this extra space.
  */
 #if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define KERNEL_IMAGE_SIZE  (_AC(3, UL) * 1024 * 1024 * 1024)
+#else
 #define KERNEL_IMAGE_SIZE  (1024 * 1024 * 1024)
+#endif
 #else
 #define KERNEL_IMAGE_SIZE  (512 * 1024 * 1024)
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 3a1ce822e1c0..e18cc23b9d99 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@

[Xen-devel] [PATCH v3 20/27] x86: Support global stack cookie

2018-05-23 Thread Thomas Garnier
Add an off-by-default configuration option to use a global stack cookie
instead of the default TLS. This configuration option will only be used
with PIE binaries.

For kernel stack cookie, the compiler uses the mcmodel=kernel to switch
between the fs segment to gs segment. A PIE binary does not use
mcmodel=kernel because it can be relocated anywhere, therefore the
compiler will default to the fs segment register. This is fixed on the
latest version of gcc.

If the segment selector is available, it will be automatically added. If
the automatic configuration was selected, a warning is written and the
global variable stack cookie is used. If a specific stack mode was
selected (regular or strong) and the compiler does not support selecting
the segment register, an error is emitted.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/Kconfig  | 12 
 arch/x86/Makefile |  9 +
 arch/x86/entry/entry_32.S |  3 ++-
 arch/x86/entry/entry_64.S |  3 ++-
 arch/x86/include/asm/processor.h  |  3 ++-
 arch/x86/include/asm/stackprotector.h | 19 ++-
 arch/x86/kernel/asm-offsets.c |  3 ++-
 arch/x86/kernel/asm-offsets_32.c  |  3 ++-
 arch/x86/kernel/asm-offsets_64.c  |  3 ++-
 arch/x86/kernel/cpu/common.c  |  3 ++-
 arch/x86/kernel/head_32.S |  3 ++-
 arch/x86/kernel/process.c |  5 +
 12 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index dda87a331a7e..0fc2e981458d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2199,6 +2199,18 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
   If unsure, leave at the default value.
 
+config X86_GLOBAL_STACKPROTECTOR
+   bool "Stack cookie using a global variable"
+   depends on CC_STACKPROTECTOR_AUTO
+   default n
+   ---help---
+  This option turns on the "stack-protector" GCC feature using a global
+  variable instead of a segment register. It is useful when the
+  compiler does not support custom segment registers when building a
+  position independent (PIE) binary.
+
+  If unsure, say N
+
 config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 60135cbd905c..277ffc57ae13 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -141,6 +141,15 @@ else
 KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
 endif
 
+ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+ifeq ($(call cc-option, -mstack-protector-guard=global),)
+$(error Cannot use CONFIG_X86_GLOBAL_STACKPROTECTOR: \
+-mstack-protector-guard=global not supported \
+by compiler)
+endif
+KBUILD_CFLAGS += -mstack-protector-guard=global
+endif
+
 ifdef CONFIG_X86_X32
x32_ld_ok := $(call try-run,\
/bin/echo -e '1: .quad 1b' | \
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index bb4f540be234..2f9bdbc6be6d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -241,7 +241,8 @@ ENTRY(__switch_to_asm)
movl%esp, TASK_threadsp(%eax)
movlTASK_threadsp(%edx), %esp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+   !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
movlTASK_stack_canary(%edx), %ebx
movl%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c1700b00b1b6..c8b4e8a7d1e1 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -359,7 +359,8 @@ ENTRY(__switch_to_asm)
movq%rsp, TASK_threadsp(%rdi)
movqTASK_threadsp(%rsi), %rsp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+   !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
movqTASK_stack_canary(%rsi), %rbx
movq%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 5cf36fa30254..6e5d9ac3bf17 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -414,7 +414,8 @@ extern asmlinkage void ignore_sysret(void);
 void save_fsgs_for_kvm(void);
 #endif
 #else  /* X86_64 */
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+   !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 /*
  * Make sure stack canary segment base is cached-aligned:
  *   "For Intel Atom processors, avoid non zero segment base address
diff --git a/arch/x86/include/asm/stackprotector.h 
b/arch/x86/include/asm/stackprotector.h
index 371b3a4af000..5063f57d99f5 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/

[Xen-devel] [PATCH v3 01/27] x86/crypto: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/crypto/aes-x86_64-asm_64.S  | 45 +
 arch/x86/crypto/aesni-intel_asm.S|  8 +-
 arch/x86/crypto/aesni-intel_avx-x86_64.S |  6 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 42 -
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 -
 arch/x86/crypto/camellia-x86_64-asm_64.S |  8 +-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S| 50 +-
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S| 44 +
 arch/x86/crypto/des3_ede-asm_64.S| 96 +---
 arch/x86/crypto/ghash-clmulni-intel_asm.S|  4 +-
 arch/x86/crypto/glue_helper-asm-avx.S|  4 +-
 arch/x86/crypto/glue_helper-asm-avx2.S   |  6 +-
 arch/x86/crypto/sha256-avx2-asm.S| 23 +++--
 13 files changed, 221 insertions(+), 159 deletions(-)

diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S 
b/arch/x86/crypto/aes-x86_64-asm_64.S
index 8739cf7795de..86fa068e5e81 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -48,8 +48,12 @@
 #define R10%r10
 #define R11%r11
 
+/* Hold global for PIE suport */
+#define RBASE  %r12
+
 #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
ENTRY(FUNC);\
+   pushq   RBASE;  \
movqr1,r2;  \
leaqKEY+48(r8),r9;  \
movqr10,r11;\
@@ -74,54 +78,63 @@
movlr6 ## E,4(r9);  \
movlr7 ## E,8(r9);  \
movlr8 ## E,12(r9); \
+   popqRBASE;  \
ret;\
ENDPROC(FUNC);
 
+#define round_mov(tab_off, reg_i, reg_o) \
+   leaqtab_off(%rip), RBASE; \
+   movl(RBASE,reg_i,4), reg_o;
+
+#define round_xor(tab_off, reg_i, reg_o) \
+   leaqtab_off(%rip), RBASE; \
+   xorl(RBASE,reg_i,4), reg_o;
+
 #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
movzbl  r2 ## H,r5 ## E;\
movzbl  r2 ## L,r6 ## E;\
-   movlTAB+1024(,r5,4),r5 ## E;\
+   round_mov(TAB+1024, r5, r5 ## E)\
movwr4 ## X,r2 ## X;\
-   movlTAB(,r6,4),r6 ## E; \
+   round_mov(TAB, r6, r6 ## E) \
roll$16,r2 ## E;\
shrl$16,r4 ## E;\
movzbl  r4 ## L,r7 ## E;\
movzbl  r4 ## H,r4 ## E;\
xorlOFFSET(r8),ra ## E; \
xorlOFFSET+4(r8),rb ## E;   \
-   xorlTAB+3072(,r4,4),r5 ## E;\
-   xorlTAB+2048(,r7,4),r6 ## E;\
+   round_xor(TAB+3072, r4, r5 ## E)\
+   round_xor(TAB+2048, r7, r6 ## E)\
movzbl  r1 ## L,r7 ## E;\
movzbl  r1 ## H,r4 ## E;\
-   movlTAB+1024(,r4,4),r4 ## E;\
+   round_mov(TAB+1024, r4, r4 ## E)\
movwr3 ## X,r1 ## X;\
roll$16,r1 ## E;\
shrl$16,r3 ## E;\
-   xorlTAB(,r7,4),r5 ## E; \
+   round_xor(TAB, r7, r5 ## E) \
movzbl  r3 ## L,r7 ## E;\
movzbl  r3 ## H,r3 ## E;\
-   xorlTAB+3072(,r3,4),r4 ## E;\
-   xorlTAB+2048(,r7,4),r5 ## E;\
+   round_xor(TAB+3072, r3, r4 ## E)\
+   round_xor(TAB+2048, r7, r5 ## E)\
movzbl  r1 ## L,r7 ## E;\
movzbl  r1 ## H,r3 ## E;\
shrl$16,r1 ## E;\
-   xorlTAB+3072(,r3,4),r6 ## E;\
-   movlTAB+2048(,r7,4),r3 ## E;\
+   round_xor(TAB+3072, r3, r6 ## E)\
+   round_mov(TAB+2048, r7, r3 ## E)\
movzbl  r1 ## L,r7 ## E;\
movzbl  r1 ## H,r1 ## E;\
-   xorlTAB+1024(,r1,4),r6 ## E;\
-   xorlTAB(,r7,4),r3 ## E; \
+   round_xor(TAB+1024, r1, r6 ## E)\
+   round_xor(TAB, r7, r3 ## E) \
movzbl  r2 ## H,r1 ## E;\
movzbl  r2 ## L,r7 ## E;\
shrl$16,r2 ## E;\
-   xorlTAB+3072(,r1,4),r3 ## E;\
-   xorlTAB+2048(,r7,4),r4 ## E;\
+   round_xor(TAB+3072, r1, r3 ## E)\
+   round_xor(TAB+2048, r7, r4 ## E)\
movzbl  r2 ## H,r1 ## E;\
movzbl  r2 ## L,r2 ## E;\
xorlOFFSET+8(r8),rc ## E;   \
xorlOFFSET+12(r8),rd ## E;  \
-   xorlTAB+1024(,r1,4),r3 ## E;\
-   xorlTAB(,r2,4),r4 ## E;
+   round_xor(TAB+1024, r1, r3 ## E)\
+   round_xor(TAB, r2, r4 ## E)
 
 #define move_regs(r1,r2,r3,r4) \
movlr3 ## E,r1 ## E;\
diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index e762ef417562..4df029aa5fc1 

[Xen-devel] [PATCH v3 24/27] x86/mm: Make the x86 GOT read-only

2018-05-23 Thread Thomas Garnier
The GOT is changed during early boot when relocations are applied. Make
it read-only directly. This table exists only for PIE binary.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 include/asm-generic/vmlinux.lds.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index e373e2e10f6a..e5b0710fe693 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -314,6 +314,17 @@
__end_ro_after_init = .;
 #endif
 
+#ifdef CONFIG_X86_PIE
+#define RO_GOT_X86 \
+   .got: AT(ADDR(.got) - LOAD_OFFSET) {\
+   VMLINUX_SYMBOL(__start_got) = .;\
+   *(.got);\
+   VMLINUX_SYMBOL(__end_got) = .;  \
+   }
+#else
+#define RO_GOT_X86
+#endif
+
 /*
  * Read only Data
  */
@@ -370,6 +381,7 @@
__end_builtin_fw = .;   \
}   \
\
+   RO_GOT_X86  \
TRACEDATA   \
\
/* Kernel symbol table: Normal symbols */   \
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 16/27] compiler: Option to add PROVIDE_HIDDEN replacement for weak symbols

2018-05-23 Thread Thomas Garnier
Provide an option to have a PROVIDE_HIDDEN (linker script) entry for
each weak symbol. This option solve an error in x86_64 where the linker
optimizes pie generate code to be non-pie because --emit-relocs was used
instead of -pie (to reduce dynamic relocations).

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 init/Kconfig|  7 +++
 scripts/link-vmlinux.sh | 14 ++
 2 files changed, 21 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index 0fc3a58d9f2f..2866cca86b4a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1954,6 +1954,13 @@ config ASN1
  inform it as to what tags are to be expected in a stream and what
  functions to call on what tags.
 
+config WEAK_PROVIDE_HIDDEN
+   bool
+   help
+ Generate linker script PROVIDE_HIDDEN entries for all weak symbols. It
+ allows to prevent non-pie code being replaced by the linker if the
+ emit-relocs option is used instead of pie (useful for x86_64 pie).
+
 source "kernel/Kconfig.locks"
 
 config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 4bf811c09f59..f5d31119b9d7 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -142,6 +142,17 @@ kallsyms()
${CC} ${aflags} -c -o ${2} ${afile}
 }
 
+gen_weak_provide_hidden()
+{
+if [ -n "${CONFIG_WEAK_PROVIDE_HIDDEN}" ]; then
+local pattern="s/^\s\+ w \(\w\+\)$/PROVIDE_HIDDEN(\1 = .);/gp"
+echo -e "SECTIONS {\n. = _end;" > .tmp_vmlinux_hiddenld
+${NM} ${1} | sed -n "${pattern}" >> .tmp_vmlinux_hiddenld
+echo "}" >> .tmp_vmlinux_hiddenld
+LDFLAGS_vmlinux="${LDFLAGS_vmlinux} -T .tmp_vmlinux_hiddenld"
+fi
+}
+
 # Create map file with all symbols from ${1}
 # See mksymap for additional details
 mksysmap()
@@ -226,6 +237,9 @@ modpost_link vmlinux.o
 # modpost vmlinux.o to check for section mismatches
 ${MAKE} -f "${srctree}/scripts/Makefile.modpost" vmlinux.o
 
+# Generate weak linker script
+gen_weak_provide_hidden vmlinux.o
+
 kallsymso=""
 kallsyms_vmlinux=""
 if [ -n "${CONFIG_KALLSYMS}" ]; then
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 12/27] x86/paravirt: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
if PIE is enabled, switch the paravirt assembly constraints to be
compatible. The %c/i constrains generate smaller code so is kept by
default.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/paravirt_types.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 180bc0bff0fb..140747a98d94 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -337,9 +337,17 @@ extern struct pv_lock_ops pv_lock_ops;
 #define PARAVIRT_PATCH(x)  \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
 
+#ifdef CONFIG_X86_PIE
+#define paravirt_opptr_call "a"
+#define paravirt_opptr_type "p"
+#else
+#define paravirt_opptr_call "c"
+#define paravirt_opptr_type "i"
+#endif
+
 #define paravirt_type(op)  \
[paravirt_typenum] "i" (PARAVIRT_PATCH(op)),\
-   [paravirt_opptr] "i" (&(op))
+   [paravirt_opptr] paravirt_opptr_type (&(op))
 #define paravirt_clobber(clobber)  \
[paravirt_clobber] "i" (clobber)
 
@@ -395,7 +403,7 @@ int paravirt_disable_iospace(void);
  */
 #define PARAVIRT_CALL  \
ANNOTATE_RETPOLINE_SAFE \
-   "call *%c[paravirt_opptr];"
+   "call *%" paravirt_opptr_call "[paravirt_opptr];"
 
 /*
  * These macros are intended to wrap calls through one of the paravirt
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 06/27] x86/entry/64: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/entry/entry_64.S| 18 --
 arch/x86/kernel/relocate_kernel_64.S |  8 +++-
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c9648b287d7f..8638dca78191 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -191,7 +191,7 @@ ENTRY(entry_SYSCALL_64_trampoline)
 * spill RDI and restore it in a second-stage trampoline.
 */
pushq   %rdi
-   movq$entry_SYSCALL_64_stage2, %rdi
+   movabsq $entry_SYSCALL_64_stage2, %rdi
JMP_NOSPEC %rdi
 END(entry_SYSCALL_64_trampoline)
 
@@ -1279,7 +1279,8 @@ ENTRY(error_entry)
movl%ecx, %eax  /* zero extend */
cmpq%rax, RIP+8(%rsp)
je  .Lbstep_iret
-   cmpq$.Lgs_change, RIP+8(%rsp)
+   leaq.Lgs_change(%rip), %rcx
+   cmpq%rcx, RIP+8(%rsp)
jne .Lerror_entry_done
 
/*
@@ -1484,10 +1485,10 @@ ENTRY(nmi)
 * resume the outer NMI.
 */
 
-   movq$repeat_nmi, %rdx
+   leaqrepeat_nmi(%rip), %rdx
cmpq8(%rsp), %rdx
ja  1f
-   movq$end_repeat_nmi, %rdx
+   leaqend_repeat_nmi(%rip), %rdx
cmpq8(%rsp), %rdx
ja  nested_nmi_out
 1:
@@ -1541,7 +1542,8 @@ nested_nmi:
pushq   %rdx
pushfq
pushq   $__KERNEL_CS
-   pushq   $repeat_nmi
+   leaqrepeat_nmi(%rip), %rdx
+   pushq   %rdx
 
/* Put stack back */
addq$(6*8), %rsp
@@ -1580,7 +1582,11 @@ first_nmi:
addq$8, (%rsp)  /* Fix up RSP */
pushfq  /* RFLAGS */
pushq   $__KERNEL_CS/* CS */
-   pushq   $1f /* RIP */
+   pushq   $0  /* Futur return address */
+   pushq   %rax/* Save RAX */
+   leaq1f(%rip), %rax  /* RIP */
+   movq%rax, 8(%rsp)   /* Put 1f on return address */
+   popq%rax/* Restore RAX */
iretq   /* continues at repeat_nmi below */
UNWIND_HINT_IRET_REGS
 1:
diff --git a/arch/x86/kernel/relocate_kernel_64.S 
b/arch/x86/kernel/relocate_kernel_64.S
index a7227dfe1a2b..0c0fc259a4e2 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -208,11 +208,9 @@ identity_mapped:
movq%rax, %cr3
lea PAGE_SIZE(%r8), %rsp
callswap_pages
-   jmp *virtual_mapped_addr(%rip)
-
-   /* Absolute value for PIE support */
-virtual_mapped_addr:
-   .quad virtual_mapped
+   movabsq $virtual_mapped, %rax
+   pushq   %rax
+   ret
 
 virtual_mapped:
movqRSP(%r8), %rsp
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 09/27] x86/acpi: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/acpi/wakeup_64.S | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 50b8ed0317a3..472659c0f811 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
 * Hooray, we are in Long 64-bit mode (but still running in low memory)
 */
 ENTRY(wakeup_long64)
-   movqsaved_magic, %rax
+   movqsaved_magic(%rip), %rax
movq$0x123456789abcdef0, %rdx
cmpq%rdx, %rax
jne bogus_64_magic
@@ -25,14 +25,14 @@ ENTRY(wakeup_long64)
movw%ax, %es
movw%ax, %fs
movw%ax, %gs
-   movqsaved_rsp, %rsp
+   movqsaved_rsp(%rip), %rsp
 
-   movqsaved_rbx, %rbx
-   movqsaved_rdi, %rdi
-   movqsaved_rsi, %rsi
-   movqsaved_rbp, %rbp
+   movqsaved_rbx(%rip), %rbx
+   movqsaved_rdi(%rip), %rdi
+   movqsaved_rsi(%rip), %rsi
+   movqsaved_rbp(%rip), %rbp
 
-   movqsaved_rip, %rax
+   movqsaved_rip(%rip), %rax
jmp *%rax
 ENDPROC(wakeup_long64)
 
@@ -45,7 +45,7 @@ ENTRY(do_suspend_lowlevel)
xorl%eax, %eax
callsave_processor_state
 
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movq%rsp, pt_regs_sp(%rax)
movq%rbp, pt_regs_bp(%rax)
movq%rsi, pt_regs_si(%rax)
@@ -64,13 +64,14 @@ ENTRY(do_suspend_lowlevel)
pushfq
popqpt_regs_flags(%rax)
 
-   movq$.Lresume_point, saved_rip(%rip)
+   leaq.Lresume_point(%rip), %rax
+   movq%rax, saved_rip(%rip)
 
-   movq%rsp, saved_rsp
-   movq%rbp, saved_rbp
-   movq%rbx, saved_rbx
-   movq%rdi, saved_rdi
-   movq%rsi, saved_rsi
+   movq%rsp, saved_rsp(%rip)
+   movq%rbp, saved_rbp(%rip)
+   movq%rbx, saved_rbx(%rip)
+   movq%rdi, saved_rdi(%rip)
+   movq%rsi, saved_rsi(%rip)
 
addq$8, %rsp
movl$3, %edi
@@ -82,7 +83,7 @@ ENTRY(do_suspend_lowlevel)
.align 4
 .Lresume_point:
/* We don't restore %rax, it must be 0 anyway */
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movqsaved_context_cr4(%rax), %rbx
movq%rbx, %cr4
movqsaved_context_cr3(%rax), %rbx
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 07/27] x86: pm-trace - Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change assembly to use the new _ASM_MOVABS macro instead of _ASM_MOV for
the assembly to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/pm-trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index bfa32aa428e5..972070806ce9 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -8,7 +8,7 @@
 do {   \
if (pm_trace_enabled) { \
const void *tracedata;  \
-   asm volatile(_ASM_MOV " $1f,%0\n"   \
+   asm volatile(_ASM_MOVABS " $1f,%0\n"\
 ".section .tracedata,\"a\"\n"  \
 "1:\t.word %c1\n\t"\
 _ASM_PTR " %c2\n"  \
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 19/27] kvm: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. The new __ASM_MOVABS macro is used to
get the address of a symbol on both 32 and 64-bit with PIE support.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++--
 arch/x86/kernel/kvm.c   | 6 --
 arch/x86/kvm/svm.c  | 4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b27de80f5870..312a398465e8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1389,9 +1389,13 @@ asmlinkage void kvm_spurious_fault(void);
".pushsection .fixup, \"ax\" \n" \
"667: \n\t" \
cleanup_insn "\n\t"   \
-   "cmpb $0, kvm_rebooting \n\t" \
+   "cmpb $0, kvm_rebooting" __ASM_SEL(,(%%rip)) " \n\t" \
"jne 668b \n\t"   \
-   __ASM_SIZE(push) " $666b \n\t"\
+   __ASM_SIZE(push) "$0 \n\t"  \
+   __ASM_SIZE(push) "%%" _ASM_AX " \n\t"   \
+   _ASM_MOVABS " $666b, %%" _ASM_AX "\n\t" \
+   _ASM_MOV " %%" _ASM_AX ", " __ASM_SEL(4,8) "(%%" _ASM_SP ") \n\t" \
+   __ASM_SIZE(pop) "%%" _ASM_AX " \n\t"\
"call kvm_spurious_fault \n\t"\
".popsection \n\t" \
_ASM_EXTABLE(666b, 667b)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7867417cfaff..394c00f21f05 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -726,8 +726,10 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
-"movq  __per_cpu_offset(,%rdi,8), %rax;"
-"cmpb  $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
+"leaq  __per_cpu_offset(%rip), %rax;"
+"movq  (%rax,%rdi,8), %rax;"
+"addq  " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rip), %rax;"
+"cmpb  $0, (%rax);"
 "setne %al;"
 "ret;"
 ".popsection");
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 220e5a89465a..2b0b25be5236 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -701,12 +701,12 @@ static u32 svm_msrpm_offset(u32 msr)
 
 static inline void clgi(void)
 {
-   asm volatile (__ex(SVM_CLGI));
+   asm volatile (__ex(SVM_CLGI) : :);
 }
 
 static inline void stgi(void)
 {
-   asm volatile (__ex(SVM_STGI));
+   asm volatile (__ex(SVM_STGI) : :);
 }
 
 static inline void invlpga(unsigned long addr, u32 asid)
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 13/27] x86/boot/64: Build head64.c as mcmodel large when PIE is enabled

2018-05-23 Thread Thomas Garnier
The __startup_64 function assumes all symbols have relocated addresses
instead of the current boot virtual address. PIE generated code favor
relative addresses making all virtual and physical address math incorrect.
If PIE is enabled, build head64.c as mcmodel large instead to ensure absolute
references on all memory access. Add a global __force_order variable required
when using a large model with read_cr* functions.

To build head64.c as mcmodel=large, disable the retpoline gcc flags.
This code is used at early boot and removed later, it doesn't need
retpoline mitigation.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/Makefile | 6 ++
 arch/x86/kernel/head64.c | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5cf4e70..0f6da4b216e0 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -22,6 +22,12 @@ CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 endif
 
+ifdef CONFIG_X86_PIE
+# Remove PIE and retpoline flags that are incompatible with mcmodel=large
+CFLAGS_REMOVE_head64.o += -fPIE -mindirect-branch=thunk-extern 
-mindirect-branch-register
+CFLAGS_head64.o = -mcmodel=large
+endif
+
 KASAN_SANITIZE_head$(BITS).o   := n
 KASAN_SANITIZE_dumpstack.o := n
 KASAN_SANITIZE_dumpstack_$(BITS).o := n
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2d29e47c056e..fa661fb97127 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -64,6 +64,9 @@ EXPORT_SYMBOL(vmemmap_base);
 
 #define __head __section(.head.text)
 
+/* Required for read_cr3 when building as PIE */
+unsigned long __force_order;
+
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
return ptr - (void *)_text + (void *)physaddr;
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 08/27] x86/CPU: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. Use the new _ASM_MOVABS macro instead of
the 'mov $symbol, %dst' construct.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/processor.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c119d423eacb..81ae6877df29 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -50,7 +50,7 @@ static inline void *current_text_addr(void)
 {
void *pc;
 
-   asm volatile("mov $1f, %0; 1:":"=r" (pc));
+   asm volatile(_ASM_MOVABS " $1f, %0; 1:":"=r" (pc));
 
return pc;
 }
@@ -718,6 +718,7 @@ static inline void sync_core(void)
: ASM_CALL_CONSTRAINT : : "memory");
 #else
unsigned int tmp;
+   unsigned long tmp2;
 
asm volatile (
UNWIND_HINT_SAVE
@@ -728,11 +729,13 @@ static inline void sync_core(void)
"pushfq\n\t"
"mov %%cs, %0\n\t"
"pushq %q0\n\t"
-   "pushq $1f\n\t"
+   "leaq 1f(%%rip), %1\n\t"
+   "pushq %1\n\t"
"iretq\n\t"
UNWIND_HINT_RESTORE
"1:"
-   : "=" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+   : "=" (tmp), "=" (tmp2), ASM_CALL_CONSTRAINT
+   : : "cc", "memory");
 #endif
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 02/27] x86: Use symbol name on bug table for PIE support

2018-05-23 Thread Thomas Garnier
Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 6804d6642767..3d690a4abf50 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -35,7 +35,7 @@ do {  
\
asm volatile("1:\t" ins "\n"\
 ".pushsection __bug_table,\"aw\"\n"\
 "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"   \
-"\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"   \
+"\t"  __BUG_REL(%P0) "\t# bug_entry::file\n"   \
 "\t.word %c1""\t# bug_entry::line\n"   \
 "\t.word %c2""\t# bug_entry::flags\n"  \
 "\t.org 2b+%c3\n"  \
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 11/27] x86/power/64: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S 
b/arch/x86/power/hibernate_asm_64.S
index ce8da3a0412c..6fdd7bbc3c33 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -24,7 +24,7 @@
 #include 
 
 ENTRY(swsusp_arch_suspend)
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movq%rsp, pt_regs_sp(%rax)
movq%rbp, pt_regs_bp(%rax)
movq%rsi, pt_regs_si(%rax)
@@ -115,7 +115,7 @@ ENTRY(restore_registers)
movq%rax, %cr4;  # turn PGE back on
 
/* We don't restore %rax, it must be 0 anyway */
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movqpt_regs_sp(%rax), %rsp
movqpt_regs_bp(%rax), %rbp
movqpt_regs_si(%rax), %rsi
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 05/27] x86: relocate_kernel - Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S 
b/arch/x86/kernel/relocate_kernel_64.S
index 11eda21eb697..a7227dfe1a2b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -208,9 +208,11 @@ identity_mapped:
movq%rax, %cr3
lea PAGE_SIZE(%r8), %rsp
callswap_pages
-   movq$virtual_mapped, %rax
-   pushq   %rax
-   ret
+   jmp *virtual_mapped_addr(%rip)
+
+   /* Absolute value for PIE support */
+virtual_mapped_addr:
+   .quad virtual_mapped
 
 virtual_mapped:
movqRSP(%r8), %rsp
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 10/27] x86/boot/64: Adapt assembly for PIE support

2018-05-23 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Early at boot, the kernel is mapped at a temporary address while preparing
the page table. To know the changes needed for the page table with KASLR,
the boot code calculate the difference between the expected address of the
kernel and the one chosen by KASLR. It does not work with PIE because all
symbols in code are relatives. Instead of getting the future relocated
virtual address, you will get the current temporary mapping. The solution
is using global variables that will be relocated as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/head_64.S | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 8344dd2f310a..7c8f7ce93b9e 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -89,8 +89,9 @@ startup_64:
popq%rsi
 
/* Form the CR3 value being sure to include the CR3 modifier */
-   addq$(early_top_pgt - __START_KERNEL_map), %rax
+   addq_early_top_pgt_offset(%rip), %rax
jmp 1f
+
 ENTRY(secondary_startup_64)
UNWIND_HINT_EMPTY
/*
@@ -119,7 +120,7 @@ ENTRY(secondary_startup_64)
popq%rsi
 
/* Form the CR3 value being sure to include the CR3 modifier */
-   addq$(init_top_pgt - __START_KERNEL_map), %rax
+   addq_init_top_offset(%rip), %rax
 1:
 
/* Enable PAE mode, PGE and LA57 */
@@ -137,7 +138,7 @@ ENTRY(secondary_startup_64)
movq%rax, %cr3
 
/* Ensure I am executing from virtual addresses */
-   movq$1f, %rax
+   movabs  $1f, %rax
ANNOTATE_RETPOLINE_SAFE
jmp *%rax
 1:
@@ -234,11 +235,12 @@ ENTRY(secondary_startup_64)
 *  REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 *  address given in m16:64.
 */
-   pushq   $.Lafter_lret   # put return address on stack for unwinder
+   leaq.Lafter_lret(%rip), %rax
+   pushq   %rax# put return address on stack for unwinder
xorq%rbp, %rbp  # clear frame pointer
-   movqinitial_code(%rip), %rax
+   leaqinitial_code(%rip), %rax
pushq   $__KERNEL_CS# set correct cs
-   pushq   %rax# target address in negative space
+   pushq   (%rax)  # target address in negative space
lretq
 .Lafter_lret:
 END(secondary_startup_64)
@@ -342,6 +344,18 @@ END(early_idt_handler_common)
 GLOBAL(early_recursion_flag)
.long 0
 
+   /*
+* Position Independent Code takes only relative references in code
+* meaning a global variable address is relative to RIP and not its
+* future virtual address. Global variables can be used instead as they
+* are still relocated on the expected kernel mapping address.
+*/
+   .align 8
+_early_top_pgt_offset:
+   .quad early_top_pgt - __START_KERNEL_map
+_init_top_offset:
+   .quad init_top_pgt - __START_KERNEL_map
+
 #define NEXT_PAGE(name) \
.balign PAGE_SIZE; \
 GLOBAL(name)
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 04/27] x86: Add macro to get symbol address for PIE support

2018-05-23 Thread Thomas Garnier
Add a new _ASM_MOVABS macro to fetch a symbol address. It will be used
to replace "_ASM_MOV $, %dst" code construct that are not compatible
with PIE.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/asm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..4492a35fad69 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -30,6 +30,7 @@
 #define _ASM_ALIGN __ASM_SEL(.balign 4, .balign 8)
 
 #define _ASM_MOV   __ASM_SIZE(mov)
+#define _ASM_MOVABS__ASM_SEL(movl, movabsq)
 #define _ASM_INC   __ASM_SIZE(inc)
 #define _ASM_DEC   __ASM_SIZE(dec)
 #define _ASM_ADD   __ASM_SIZE(add)
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 03/27] x86: Use symbol name in jump table for PIE support

2018-05-23 Thread Thomas Garnier
Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/jump_label.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h 
b/arch/x86/include/asm/jump_label.h
index 8c0de4282659..dfdcdc39604a 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -37,9 +37,9 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
_ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   _ASM_PTR "1b, %l[l_yes], %P0 \n\t"
".popsection \n\t"
-   : :  "i" (key), "i" (branch) : : l_yes);
+   : :  "X" (&((char *)key)[branch]) : : l_yes);
 
return false;
 l_yes:
@@ -53,9 +53,9 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
"2:\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
_ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   _ASM_PTR "1b, %l[l_yes], %P0 \n\t"
".popsection \n\t"
-   : :  "i" (key), "i" (branch) : : l_yes);
+   : :  "X" (&((char *)key)[branch]) : : l_yes);
 
return false;
 l_yes:
-- 
2.17.0.441.gb46fe60e1d-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 00/27] x86: PIE support and option to extend KASLR randomization

2018-05-23 Thread Thomas Garnier
Changes:
 - patch v3:
   - Update on message to describe longer term PIE goal.
   - Minor change on ftrace if condition.
   - Changed code using xchgq.
 - patch v2:
   - Adapt patch to work post KPTI and compiler changes
   - Redo all performance testing with latest configs and compilers
   - Simplify mov macro on PIE (MOVABS now)
   - Reduce GOT footprint
 - patch v1:
   - Simplify ftrace implementation.
   - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
 - rfc v3:
   - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
 mapped memory. It also simplifies the relocation process.
   - Move the start the module section next to the kernel. Remove the need for
 -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
   - Support for XEN PVH as 32-bit relocations can be ignored with
 --emit-relocs.
   - Support for GOT relocations previously done automatically with -pie.
   - Remove need for dynamic PLT in modules.
   - Support dymamic GOT for modules.
 - rfc v2:
   - Add support for global stack cookie while compiler default to fs without
 mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
 preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G. The chosen range is the one currently
available, future changes will allow the kernel module to have a wider
randomization range.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-13, 18-19: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_MOVABS macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
   It removes errors between compilation units.
 - 16: Add PROVIDE_HIDDEN replacement on the linker script for weak symbols to
   reduce GOT footprint.
 - 17: Adapt relocation tool to handle PIE binary correctly.
 - 20: Add support for global cookie.
 - 21: Support ftrace with PIE (used on Ubuntu config).
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
   from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.18%
 - PIE enabled: -1.977% (less relocations)
 .text section:
 - PIE disabled: same
 - PIE enabled: same

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: +0.21%
 - PIE enabled: +10%
 .text section:
 - PIE disabled: same
 - PIE enabled: +0.001%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg -/+ 0.5% on latest test).
 - PIE enabled: between -1% to +1% in average (default and Ubuntu config).

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.5%)
 - PIE enabled: average -0.5% to +0.5%
 System Time:
 - PIE disabled: no significant change (avg -0.1%)
 - PIE enabled: average -0.4% to +0.4%.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt  |3 
 arch/x86/Kconfig |   45 ++
 arch/x86/Makefile|   58 
 arch/x86/boot/boot.h |2 
 arch/x86/boot/compressed/Makefile|5 
 arch/x86/boot/compressed/misc.c  |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S  |   45 --
 arch/x86/crypto/aesni-intel_asm.S|8 -
 arch/x86/crypto/aesni-intel_avx-x86_64.S |6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 +++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S |8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S|   50 ---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S|   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S|   96 +-
 arch/x86/crypto/ghash-clmulni-intel_asm.S|4 
 

Re: [Xen-devel] [PATCH v2 06/27] x86/entry/64: Adapt assembly for PIE support

2018-03-14 Thread Thomas Garnier
On Wed, Mar 14, 2018 at 8:55 AM Christopher Lameter <c...@linux.com> wrote:



> On Wed, 14 Mar 2018, Peter Zijlstra wrote:

> > On Tue, Mar 13, 2018 at 01:59:24PM -0700, Thomas Garnier wrote:
> > > @@ -1576,7 +1578,9 @@ first_nmi:
> > > addq$8, (%rsp)  /* Fix up RSP */
> > > pushfq  /* RFLAGS */
> > > pushq   $__KERNEL_CS/* CS */
> > > -   pushq   $1f /* RIP */
> > > +   pushq   %rax/* Support Position Independent Code */
> > > +   leaq1f(%rip), %rax  /* RIP */
> > > +   xchgq   %rax, (%rsp)/* Restore RAX, put 1f */
> > > iretq   /* continues at repeat_nmi below */
> > > UNWIND_HINT_IRET_REGS
> > >  1:
> >
> > Urgh, xchg with a memop has an implicit LOCK prefix.

> this_cpu_xchg uses no lock cmpxchg as a replacement to reduce latency.

Great, I will update my implementation.

Thanks Peter and Christoph.


>  From linux/arch/x86/include/asm/percpu.h

> /*
>   * xchg is implemented using cmpxchg without a lock prefix. xchg is
>   * expensive due to the implied lock prefix.  The processor cannot
prefetch
>   * cachelines if xchg is used.
>   */
> #define percpu_xchg_op(var, nval)   \
> ({  \
>  typeof(var) pxo_ret__;  \
>  typeof(var) pxo_new__ = (nval); \
>  switch (sizeof(var)) {  \
>  case 1: \
>  asm("\n\tmov "__percpu_arg(1)",%%al"\
>  "\n1:\tcmpxchgb %2, "__percpu_arg(1)\
>  "\n\tjnz 1b"\
>  : "=" (pxo_ret__), "+m" (var) \
>  : "q" (pxo_new__)   \
>  : "memory");\
>  break;  \
>  case 2: \
>  asm("\n\tmov "__percpu_arg(1)",%%ax"\
>  "\n1:\tcmpxchgw %2, "__percpu_arg(1)\
>  "\n\tjnz 1b"\
>  : "=" (pxo_ret__), "+m" (var) \
>  : "r" (pxo_new__)   \
>  : "memory");\
>  break;  \
>  case 4: \
>  asm("\n\tmov "__percpu_arg(1)",%%eax"   \
>  "\n1:\tcmpxchgl %2, "__percpu_arg(1)\
>  "\n\tjnz 1b"\
>  : "=" (pxo_ret__), "+m" (var) \
>  : "r" (pxo_new__)   \
>  : "memory");\
>  break;  \
>  case 8: \
>  asm("\n\tmov "__percpu_arg(1)",%%rax"   \
>  "\n1:\tcmpxchgq %2, "__percpu_arg(1)\
>  "\n\tjnz 1b"\
>  : "=" (pxo_ret__), "+m" (var) \
>  : "r" (pxo_new__)   \
>  : "memory");\
>  break;  \
>  default: __bad_percpu_size();   \
>  }   \
>  pxo_ret__;  \




-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 19/27] kvm: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. The new __ASM_MOVABS macro is used to
get the address of a symbol on both 32 and 64-bit with PIE support.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/kvm_host.h | 6 --
 arch/x86/kernel/kvm.c   | 6 --
 arch/x86/kvm/svm.c  | 4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b605a5b6a30c..7bd6ba79e778 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1380,9 +1380,11 @@ asmlinkage void kvm_spurious_fault(void);
".pushsection .fixup, \"ax\" \n" \
"667: \n\t" \
cleanup_insn "\n\t"   \
-   "cmpb $0, kvm_rebooting \n\t" \
+   "cmpb $0, kvm_rebooting" __ASM_SEL(,(%%rip)) " \n\t" \
"jne 668b \n\t"   \
-   __ASM_SIZE(push) " $666b \n\t"\
+   __ASM_SIZE(push) "%%" _ASM_AX " \n\t"   \
+   _ASM_MOVABS " $666b, %%" _ASM_AX "\n\t" \
+   "xchg %%" _ASM_AX ", (%%" _ASM_SP ") \n\t"  \
"call kvm_spurious_fault \n\t"\
".popsection \n\t" \
_ASM_EXTABLE(666b, 667b)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index bc1a27280c4b..5e4dd958ea95 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -711,8 +711,10 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
-"movq  __per_cpu_offset(,%rdi,8), %rax;"
-"cmpb  $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
+"leaq  __per_cpu_offset(%rip), %rax;"
+"movq  (%rax,%rdi,8), %rax;"
+"addq  " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rip), %rax;"
+"cmpb  $0, (%rax);"
 "setne %al;"
 "ret;"
 ".popsection");
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index be9c839e2c89..6835a2ce02e5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -626,12 +626,12 @@ static u32 svm_msrpm_offset(u32 msr)
 
 static inline void clgi(void)
 {
-   asm volatile (__ex(SVM_CLGI));
+   asm volatile (__ex(SVM_CLGI) : :);
 }
 
 static inline void stgi(void)
 {
-   asm volatile (__ex(SVM_STGI));
+   asm volatile (__ex(SVM_STGI) : :);
 }
 
 static inline void invlpga(unsigned long addr, u32 asid)
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 20/27] x86: Support global stack cookie

2018-03-13 Thread Thomas Garnier
Add an off-by-default configuration option to use a global stack cookie
instead of the default TLS. This configuration option will only be used
with PIE binaries.

For kernel stack cookie, the compiler uses the mcmodel=kernel to switch
between the fs segment to gs segment. A PIE binary does not use
mcmodel=kernel because it can be relocated anywhere, therefore the
compiler will default to the fs segment register. This is fixed on the
latest version of gcc.

If the segment selector is available, it will be automatically added. If
the automatic configuration was selected, a warning is written and the
global variable stack cookie is used. If a specific stack mode was
selected (regular or strong) and the compiler does not support selecting
the segment register, an error is emitted.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/Kconfig  | 12 
 arch/x86/Makefile |  9 +
 arch/x86/entry/entry_32.S |  3 ++-
 arch/x86/entry/entry_64.S |  3 ++-
 arch/x86/include/asm/processor.h  |  3 ++-
 arch/x86/include/asm/stackprotector.h | 19 ++-
 arch/x86/kernel/asm-offsets.c |  3 ++-
 arch/x86/kernel/asm-offsets_32.c  |  3 ++-
 arch/x86/kernel/asm-offsets_64.c  |  3 ++-
 arch/x86/kernel/cpu/common.c  |  3 ++-
 arch/x86/kernel/head_32.S |  3 ++-
 arch/x86/kernel/process.c |  5 +
 12 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a0a777ce4c7c..0cb1ae187c3e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2236,6 +2236,18 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
   If unsure, leave at the default value.
 
+config X86_GLOBAL_STACKPROTECTOR
+   bool "Stack cookie using a global variable"
+   depends on CC_STACKPROTECTOR_AUTO
+   default n
+   ---help---
+  This option turns on the "stack-protector" GCC feature using a global
+  variable instead of a segment register. It is useful when the
+  compiler does not support custom segment registers when building a
+  position independent (PIE) binary.
+
+  If unsure, say N
+
 config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 498c1b812300..16dafc551f3b 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,6 +142,15 @@ else
 KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
 endif
 
+ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+ifeq ($(call cc-option, -mstack-protector-guard=global),)
+$(error Cannot use CONFIG_X86_GLOBAL_STACKPROTECTOR: \
+-mstack-protector-guard=global not supported \
+by compiler)
+endif
+KBUILD_CFLAGS += -mstack-protector-guard=global
+endif
+
 ifdef CONFIG_X86_X32
x32_ld_ok := $(call try-run,\
/bin/echo -e '1: .quad 1b' | \
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index bef8e2b202a8..b7d5bc710ae7 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -239,7 +239,8 @@ ENTRY(__switch_to_asm)
movl%esp, TASK_threadsp(%eax)
movlTASK_threadsp(%edx), %esp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+   !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
movlTASK_stack_canary(%edx), %ebx
movl%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d34794368e20..fbf2b63b4e78 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -356,7 +356,8 @@ ENTRY(__switch_to_asm)
movq%rsp, TASK_threadsp(%rdi)
movqTASK_threadsp(%rsi), %rsp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+   !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
movqTASK_stack_canary(%rsi), %rbx
movq%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1b9488b1018a..f11284481597 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -411,7 +411,8 @@ DECLARE_PER_CPU(char *, irq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 #else  /* X86_64 */
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+   !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 /*
  * Make sure stack canary segment base is cached-aligned:
  *   "For Intel Atom processors, avoid non zero segment base address
diff --git a/arch/x86/include/asm/stackprotector.h 
b/arch/x86/include/asm/stackprotector.h
index 371b3a4af000..5063f57d99f5 100644
--- a

[Xen-devel] [PATCH v2 24/27] x86/mm: Make the x86 GOT read-only

2018-03-13 Thread Thomas Garnier
The GOT is changed during early boot when relocations are applied. Make
it read-only directly. This table exists only for PIE binary.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 include/asm-generic/vmlinux.lds.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 1ab0e520d6fc..89398d042f78 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -295,6 +295,17 @@
VMLINUX_SYMBOL(__end_ro_after_init) = .;
 #endif
 
+#ifdef CONFIG_X86_PIE
+#define RO_GOT_X86 \
+   .got: AT(ADDR(.got) - LOAD_OFFSET) {\
+   VMLINUX_SYMBOL(__start_got) = .;\
+   *(.got);\
+   VMLINUX_SYMBOL(__end_got) = .;  \
+   }
+#else
+#define RO_GOT_X86
+#endif
+
 /*
  * Read only Data
  */
@@ -351,6 +362,7 @@
VMLINUX_SYMBOL(__end_builtin_fw) = .;   \
}   \
\
+   RO_GOT_X86  \
TRACEDATA   \
\
/* Kernel symbol table: Normal symbols */   \
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 26/27] x86/relocs: Add option to generate 64-bit relocations

2018-03-13 Thread Thomas Garnier
The x86 relocation tool generates a list of 32-bit signed integers. There
was no need to use 64-bit integers because all addresses where above the 2G
top of the memory.

This change add a large-reloc option to generate 64-bit unsigned integers.
It can be used when the kernel plan to go below the top 2G and 32-bit
integers are not enough.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/tools/relocs.c| 60 +++---
 arch/x86/tools/relocs.h|  4 +--
 arch/x86/tools/relocs_common.c | 15 ++---
 3 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 29283ad3950f..a29eaac6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -13,8 +13,14 @@
 
 static Elf_Ehdr ehdr;
 
+#if ELF_BITS == 64
+typedef uint64_t rel_off_t;
+#else
+typedef uint32_t rel_off_t;
+#endif
+
 struct relocs {
-   uint32_t*offset;
+   rel_off_t   *offset;
unsigned long   count;
unsigned long   size;
 };
@@ -685,7 +691,7 @@ static void print_absolute_relocs(void)
printf("\n");
 }
 
-static void add_reloc(struct relocs *r, uint32_t offset)
+static void add_reloc(struct relocs *r, rel_off_t offset)
 {
if (r->count == r->size) {
unsigned long newsize = r->size + 5;
@@ -1061,26 +1067,48 @@ static void sort_relocs(struct relocs *r)
qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
 }
 
-static int write32(uint32_t v, FILE *f)
+static int write32(rel_off_t rel, FILE *f)
 {
-   unsigned char buf[4];
+   unsigned char buf[sizeof(uint32_t)];
+   uint32_t v = (uint32_t)rel;
 
put_unaligned_le32(v, buf);
-   return fwrite(buf, 1, 4, f) == 4 ? 0 : -1;
+   return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
 }
 
-static int write32_as_text(uint32_t v, FILE *f)
+static int write32_as_text(rel_off_t rel, FILE *f)
 {
+   uint32_t v = (uint32_t)rel;
return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1;
 }
 
-static void emit_relocs(int as_text, int use_real_mode)
+static int write64(rel_off_t rel, FILE *f)
+{
+   unsigned char buf[sizeof(uint64_t)];
+   uint64_t v = (uint64_t)rel;
+
+   put_unaligned_le64(v, buf);
+   return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
+}
+
+static int write64_as_text(rel_off_t rel, FILE *f)
+{
+   uint64_t v = (uint64_t)rel;
+   return fprintf(f, "\t.quad 0x%016"PRIx64"\n", v) > 0 ? 0 : -1;
+}
+
+static void emit_relocs(int as_text, int use_real_mode, int use_large_reloc)
 {
int i;
-   int (*write_reloc)(uint32_t, FILE *) = write32;
+   int (*write_reloc)(rel_off_t, FILE *);
int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym,
const char *symname);
 
+   if (use_large_reloc)
+   write_reloc = write64;
+   else
+   write_reloc = write32;
+
 #if ELF_BITS == 64
if (!use_real_mode)
do_reloc = do_reloc64;
@@ -1091,6 +1119,9 @@ static void emit_relocs(int as_text, int use_real_mode)
do_reloc = do_reloc32;
else
do_reloc = do_reloc_real;
+
+   /* Large relocations only for 64-bit */
+   use_large_reloc = 0;
 #endif
 
/* Collect up the relocations */
@@ -1114,8 +1145,13 @@ static void emit_relocs(int as_text, int use_real_mode)
 * gas will like.
 */
printf(".section \".data.reloc\",\"a\"\n");
-   printf(".balign 4\n");
-   write_reloc = write32_as_text;
+   if (use_large_reloc) {
+   printf(".balign 8\n");
+   write_reloc = write64_as_text;
+   } else {
+   printf(".balign 4\n");
+   write_reloc = write32_as_text;
+   }
}
 
if (use_real_mode) {
@@ -1183,7 +1219,7 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 int show_absolute_syms, int show_absolute_relocs,
-int show_reloc_info)
+int show_reloc_info, int use_large_reloc)
 {
regex_init(use_real_mode);
read_ehdr(fp);
@@ -1206,5 +1242,5 @@ void process(FILE *fp, int use_real_mode, int as_text,
print_reloc_info();
return;
}
-   emit_relocs(as_text, use_real_mode);
+   emit_relocs(as_text, use_real_mode, use_large_reloc);
 }
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 43c83c0fd22c..3d401da59df7 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -31,8 +31,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
int 

[Xen-devel] [PATCH v2 18/27] xen: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use the new _ASM_MOVABS macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/tools/relocs.c | 16 +++-
 arch/x86/xen/xen-head.S | 11 ++-
 arch/x86/xen/xen-pvh.S  | 13 +
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index a35cc337f883..29283ad3950f 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -832,6 +832,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char 
*symname)
strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+   ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+   return sym && (offset >= sym->st_value) &&
+   (offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
  const char *symname)
@@ -895,8 +905,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, 
ElfW(Sym) *sym,
 * the relocations are processed.
 * Make sure that the offset will fit.
 */
-   if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+   if (r_type != R_X86_64_64 &&
+   (int32_t)offset != (int64_t)offset) {
+   if (is_in_xenpvh_assembly(offset))
+   break;
die("Relocation offset doesn't fit in 32 bits\n");
+   }
 
if (r_type == R_X86_64_64)
add_reloc(, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 96f26e026783..210568e63c84 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -28,14 +28,15 @@ ENTRY(startup_xen)
 
/* Clear .bss */
xor %eax,%eax
-   mov $__bss_start, %_ASM_DI
-   mov $__bss_stop, %_ASM_CX
+   _ASM_MOVABS $__bss_start, %_ASM_DI
+   _ASM_MOVABS $__bss_stop, %_ASM_CX
sub %_ASM_DI, %_ASM_CX
shr $__ASM_SEL(2, 3), %_ASM_CX
rep __ASM_SIZE(stos)
 
-   mov %_ASM_SI, xen_start_info
-   mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+   _ASM_MOVABS $xen_start_info, %_ASM_AX
+   _ASM_MOV %_ASM_SI, (%_ASM_AX)
+   _ASM_MOVABS $init_thread_union+THREAD_SIZE, %_ASM_SP
 
 #ifdef CONFIG_X86_64
/* Set up %gs.
@@ -46,7 +47,7 @@ ENTRY(startup_xen)
 * init data section till per cpu areas are set up.
 */
movl$MSR_GS_BASE,%ecx
-   movq$INIT_PER_CPU_VAR(irq_stack_union),%rax
+   movabsq $INIT_PER_CPU_VAR(irq_stack_union),%rax
cdq
wrmsr
 #endif
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index e1a5fbeae08d..43e234c7c2de 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -101,8 +101,8 @@ ENTRY(pvh_start_xen)
call xen_prepare_pvh
 
/* startup_64 expects boot_params in %rsi. */
-   mov $_pa(pvh_bootparams), %rsi
-   mov $_pa(startup_64), %rax
+   movabs $_pa(pvh_bootparams), %rsi
+   movabs $_pa(startup_64), %rax
jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -137,10 +137,15 @@ END(pvh_start_xen)
 
.section ".init.data","aw"
.balign 8
+   /*
+* Use a quad for _pa(gdt_start) because PIE does not understand a
+* long is enough. The resulting value will still be in the lower long
+* part.
+*/
 gdt:
.word gdt_end - gdt_start
-   .long _pa(gdt_start)
-   .word 0
+   .quad _pa(gdt_start)
+   .balign 8
 gdt_start:
.quad 0x/* NULL descriptor */
.quad 0x/* reserved */
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 27/27] x86/kaslr: Add option to extend KASLR range from 1GB to 3GB

2018-03-13 Thread Thomas Garnier
Add a new CONFIG_RANDOMIZE_BASE_LARGE option to benefit from PIE
support. It increases the KASLR range from 1GB to 3GB. The new range
stars at 0x just above the EFI memory region. This
option is off by default.

The boot code is adapted to create the appropriate page table spanning
three PUD pages.

The relocation table uses 64-bit integers generated with the updated
relocation tool with the large-reloc option.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/Kconfig | 21 +
 arch/x86/boot/compressed/Makefile|  5 +
 arch/x86/boot/compressed/misc.c  | 10 +-
 arch/x86/include/asm/page_64_types.h |  9 +
 arch/x86/kernel/head64.c | 15 ---
 arch/x86/kernel/head_64.S| 11 ++-
 6 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b1615e661d6..7ea69cb0153f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2260,6 +2260,27 @@ config X86_PIE
select DYNAMIC_MODULE_BASE
select MODULE_REL_CRCS if MODVERSIONS
 
+config RANDOMIZE_BASE_LARGE
+   bool "Increase the randomization range of the kernel image"
+   depends on X86_64 && RANDOMIZE_BASE
+   select X86_PIE
+   select X86_MODULE_PLTS if MODULES
+   default n
+   ---help---
+ Build the kernel as a Position Independent Executable (PIE) and
+ increase the available randomization range from 1GB to 3GB.
+
+ This option impacts performance on kernel CPU intensive workloads up
+ to 10% due to PIE generated code. Impact on user-mode processes and
+ typical usage would be significantly less (0.50% when you build the
+ kernel).
+
+ The kernel and modules will generate slightly more assembly (1 to 2%
+ increase on the .text sections). The vmlinux binary will be
+ significantly smaller due to less relocations.
+
+ If unsure say N
+
 config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 1f734cd98fd3..fb72f53defd0 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -116,7 +116,12 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 
 targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all 
vmlinux.relocs
 
+# Large randomization require bigger relocation table
+ifeq ($(CONFIG_RANDOMIZE_BASE_LARGE),y)
+CMD_RELOCS = arch/x86/tools/relocs --large-reloc
+else
 CMD_RELOCS = arch/x86/tools/relocs
+endif
 quiet_cmd_relocs = RELOCS  $@
   cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index b50c42455e25..746a968690d5 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -170,10 +170,18 @@ void __puthex(unsigned long value)
 }
 
 #if CONFIG_X86_NEED_RELOCS
+
+/* Large randomization go lower than -2G and use large relocation table */
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+typedef long rel_t;
+#else
+typedef int rel_t;
+#endif
+
 static void handle_relocations(void *output, unsigned long output_len,
   unsigned long virt_addr)
 {
-   int *reloc;
+   rel_t *reloc;
unsigned long delta, map, ptr;
unsigned long min_addr = (unsigned long)output;
unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 2c5a966dc222..85ea681421d2 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -46,7 +46,11 @@
 #define __PAGE_OFFSET   __PAGE_OFFSET_BASE_L4
 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
 
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define __START_KERNEL_map _AC(0x, UL)
+#else
 #define __START_KERNEL_map _AC(0x8000, UL)
+#endif /* CONFIG_RANDOMIZE_BASE_LARGE */
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 
@@ -64,9 +68,14 @@
  * 512MiB by default, leaving 1.5GiB for modules once the page tables
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
+ * On PIE, we relocate the binary 2G lower so add this extra space.
  */
 #if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define KERNEL_IMAGE_SIZE  (_AC(3, UL) * 1024 * 1024 * 1024)
+#else
 #define KERNEL_IMAGE_SIZE  (1024 * 1024 * 1024)
+#endif
 #else
 #define KERNEL_IMAGE_SIZE  (512 * 1024 * 1024)
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index ea4c498369d8..577b47381ba2 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@

[Xen-devel] [PATCH v2 25/27] x86/pie: Add option to build the kernel as PIE

2018-03-13 Thread Thomas Garnier
Add the CONFIG_X86_PIE option which builds the kernel as a Position
Independent Executable (PIE). The kernel is currently build with the
mcmodel=kernel option which forces it to stay on the top 2G of the
virtual address space. With PIE, the kernel will be able to move below
the current limit.

The --emit-relocs linker option was kept instead of using -pie to limit
the impact on mapped sections. Any incompatible relocation will be
catch by the arch/x86/tools/relocs binary at compile time.

If segment based stack cookies are enabled, try to use the compiler
option to select the segment register. If not available, automatically
enabled global stack cookie in auto mode. Otherwise, recommend
compiler update or global stack cookie option.

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.18%
 - PIE enabled: -1.977% (less relocations)
 .text section:
 - PIE disabled: same
 - PIE enabled: same

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: +0.21%
 - PIE enabled: +10%
 .text section:
 - PIE disabled: same
 - PIE enabled: +0.001%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg -/+ 0.5% on latest test).
 - PIE enabled: between -1% to +1% in average (default and Ubuntu config).

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.5%)
 - PIE enabled: average -0.5% to +0.5%
 System Time:
 - PIE disabled: no significant change (avg -0.1%)
 - PIE enabled: average -0.4% to +0.4%.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Signed-off-by: Thomas Garnier <thgar...@google.com>

merge pie
---
 arch/x86/Kconfig  |  8 
 arch/x86/Makefile | 45 -
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index df4134fd3247..4b1615e661d6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2252,6 +2252,14 @@ config X86_GLOBAL_STACKPROTECTOR
 
   If unsure, say N
 
+config X86_PIE
+   bool
+   depends on X86_64
+   select DEFAULT_HIDDEN
+   select WEAK_PROVIDE_HIDDEN
+   select DYNAMIC_MODULE_BASE
+   select MODULE_REL_CRCS if MODVERSIONS
+
 config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index f24d200c0d9d..ab0cf88c7059 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -61,6 +61,8 @@ endif
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
 KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 
+stackglobal := $(call cc-option-yn, -mstack-protector-guard=global)
+
 ifeq ($(CONFIG_X86_32),y)
 BITS := 32
 UTS_MACHINE := i386
@@ -136,7 +138,48 @@ else
 
 KBUILD_CFLAGS += -mno-red-zone
 ifdef CONFIG_X86_PIE
+KBUILD_CFLAGS += -fPIE
 KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
+
+# Relax relocation in both CFLAGS and LDFLAGS to support older 
compilers
+KBUILD_CFLAGS += $(call cc-option,-Wa$(comma)-mrelax-relocations=no)
+LDFLAGS_vmlinux += $(call ld-option,--no-relax)
+KBUILD_LDFLAGS_MODULE += $(call ld-option,--no-relax)
+
+# Stack validation is not yet support due to self-referenced switches
+ifdef CONFIG_STACK_VALIDATION
+$(warning CONFIG_STACK_VALIDATION is not yet supported for x86_64 pie \
+   build.)
+SKIP_STACK_VALIDATION := 1
+export SKIP_STACK_VALIDATION
+endif
+
+ifndef CONFIG_CC_STACKPROTECTOR_NONE
+ifndef CONFIG_X86_GLOBAL_STACKPROTECTOR
+stackseg-flag := -mstack-protector-guard-reg=%gs
+ifeq ($(call cc-option-yn,$(stackseg-flag)),n)
+# Try to enable global stack cookie if possible
+ifeq ($(stackglobal), y)
+$(warning Cannot use CONFIG_CC_STACKPROTECTOR_* while \
+building a position independent kernel. \
+Default to global stack protector \
+(CONFIG_X86_GLOBAL_STACKPROTECTOR).)
+CONFIG_X86_GLOBAL_STACKPROTECTOR := y
+KBUILD_CFLAGS += -DCONFIG_X86_GLOBAL_STACKPROTECTOR
+KBUILD_AFLAGS += -DCONFIG_X86_GLOBAL_STACKPROTECTOR
+else
+$(error echo Cannot use \
+CONFIG_CC_STACKPROTECTOR_(REGULAR|STRONG|AUTO) 
\
+while building a position independent binary. \
+Update your

[Xen-devel] [PATCH v2 21/27] x86/ftrace: Adapt function tracing for PIE support

2018-03-13 Thread Thomas Garnier
When using -fPIE/PIC with function tracing, the compiler generates a
call through the GOT (call *__fentry__@GOTPCREL). This instruction
takes 6 bytes instead of 5 on the usual relative call.

If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte nop
so ftrace can handle the previous 5-bytes as before.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/ftrace.h   |  6 +++--
 arch/x86/include/asm/sections.h |  4 
 arch/x86/kernel/ftrace.c| 42 +++--
 3 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 09ad88572746..61fa02d81b95 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -25,9 +25,11 @@ extern void __fentry__(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
/*
-* addr is the address of the mcount call instruction.
-* recordmcount does the necessary offset calculation.
+* addr is the address of the mcount call instruction. PIE has always a
+* byte added to the start of the function.
 */
+   if (IS_ENABLED(CONFIG_X86_PIE))
+   addr -= 1;
return addr;
 }
 
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index d6baf23782bc..cad292f62eed 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -12,4 +12,8 @@ extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+#if defined(CONFIG_X86_PIE)
+extern char __start_got[], __end_got[];
+#endif
+
 #endif /* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 01ebcb6f263e..21bde498f1a9 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -102,7 +102,7 @@ static const unsigned char *ftrace_nop_replace(void)
 
 static int
 ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-  unsigned const char *new_code)
+ unsigned const char *new_code)
 {
unsigned char replaced[MCOUNT_INSN_SIZE];
 
@@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip, unsigned const 
char *old_code,
return 0;
 }
 
+/* Bytes before call GOT offset */
+const unsigned char got_call_preinsn[] = { 0xff, 0x15 };
+
+static int
+ftrace_modify_initial_code(unsigned long ip, unsigned const char *old_code,
+  unsigned const char *new_code)
+{
+   unsigned char replaced[MCOUNT_INSN_SIZE + 1];
+
+   ftrace_expected = old_code;
+
+   /*
+* If PIE is not enabled or no GOT call was found, default to the
+* original approach to code modification.
+*/
+   if (!IS_ENABLED(CONFIG_X86_PIE)
+   || probe_kernel_read(replaced, (void *)ip, sizeof(replaced))
+   || memcmp(replaced, got_call_preinsn, sizeof(got_call_preinsn)))
+   return ftrace_modify_code_direct(ip, old_code, new_code);
+
+   /*
+* Build a nop slide with a 5-byte nop and 1-byte nop to keep the ftrace
+* hooking algorithm working with the expected 5 bytes instruction.
+*/
+   memcpy(replaced, new_code, MCOUNT_INSN_SIZE);
+   replaced[MCOUNT_INSN_SIZE] = ideal_nops[1][0];
+
+   ip = text_ip_addr(ip);
+
+   if (probe_kernel_write((void *)ip, replaced, sizeof(replaced)))
+   return -EPERM;
+
+   sync_core();
+
+   return 0;
+
+}
+
 int ftrace_make_nop(struct module *mod,
struct dyn_ftrace *rec, unsigned long addr)
 {
@@ -153,7 +191,7 @@ int ftrace_make_nop(struct module *mod,
 * just modify the code directly.
 */
if (addr == MCOUNT_ADDR)
-   return ftrace_modify_code_direct(rec->ip, old, new);
+   return ftrace_modify_initial_code(rec->ip, old, new);
 
ftrace_expected = NULL;
 
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 22/27] x86/modules: Add option to start module section after kernel

2018-03-13 Thread Thomas Garnier
Add an option so the module section is just after the mapped kernel. It
will ensure position independent modules are always at the right
distance from the kernel and do not require mcmodule=large. It also
optimize the available size for modules by getting rid of the empty
space on kernel randomization range.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 Documentation/x86/x86_64/mm.txt | 3 +++
 arch/x86/Kconfig| 4 
 arch/x86/include/asm/pgtable_64_types.h | 6 ++
 arch/x86/kernel/head64.c| 5 -
 arch/x86/mm/dump_pagetables.c   | 3 ++-
 5 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index ea91cb61a602..ec1dfe4c3cfe 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -77,3 +77,6 @@ Their order is preserved but their base will be offset early 
at boot time.
 Be very careful vs. KASLR when changing anything here. The KASLR address
 range must not overlap with anything except the KASAN shadow area, which is
 correct as KASAN disables KASLR.
+
+If CONFIG_DYNAMIC_MODULE_BASE is enabled, the module section follows the end of
+the mapped kernel.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0cb1ae187c3e..df4134fd3247 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2236,6 +2236,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
   If unsure, leave at the default value.
 
+# Module section starts just after the end of the kernel module
+config DYNAMIC_MODULE_BASE
+   bool
+
 config X86_GLOBAL_STACKPROTECTOR
bool "Stack cookie using a global variable"
depends on CC_STACKPROTECTOR_AUTO
diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index d5c21a382475..d4d3b21d5b3d 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -7,6 +7,7 @@
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /*
  * These are used to make use of C type-checking..
@@ -126,7 +127,12 @@ extern unsigned int ptrs_per_p4d;
 
 #define VMALLOC_END(VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
 
+#ifdef CONFIG_DYNAMIC_MODULE_BASE
+#define MODULES_VADDR  ALIGN(((unsigned long)_end + PAGE_SIZE), 
PMD_SIZE)
+#else
 #define MODULES_VADDR  (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#endif
+
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END_AC(0xff00, UL)
 #define MODULES_LEN(MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2fe60e661227..ea4c498369d8 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -384,12 +384,15 @@ asmlinkage __visible void __init x86_64_start_kernel(char 
* real_mode_data)
 * Build-time sanity checks on the kernel image and module
 * area mappings. (these are purely build-time and produce no code)
 */
+#ifndef CONFIG_DYNAMIC_MODULE_BASE
BUILD_BUG_ON(MODULES_VADDR < __START_KERNEL_map);
BUILD_BUG_ON(MODULES_VADDR - __START_KERNEL_map < KERNEL_IMAGE_SIZE);
-   BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
+   BUILD_BUG_ON(!IS_ENABLED(CONFIG_RANDOMIZE_BASE_LARGE) &&
+MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+#endif
MAYBE_BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
(__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 62a7e9f65dec..6f0b1fa2a71a 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -104,7 +104,7 @@ static struct addr_marker address_markers[] = {
[EFI_END_NR]= { EFI_VA_END, "EFI Runtime Services" 
},
 #endif
[HIGH_KERNEL_NR]= { __START_KERNEL_map, "High Kernel Mapping" },
-   [MODULES_VADDR_NR]  = { MODULES_VADDR,  "Modules" },
+   [MODULES_VADDR_NR]  = { 0/*MODULES_VADDR*/, "Modules" },
[MODULES_END_NR]= { MODULES_END,"End Modules" },
[FIXADDR_START_NR]  = { FIXADDR_START,  "Fixmap Area" },
[END_OF_SPACE_NR]   = { -1, NULL }
@@ -599,6 +599,7 @@ static int __init pt_dump_init(void)
address_markers[KASAN_SHADOW_START_NR].start_address = 
KASAN_SHADOW_START;
address_markers[KASAN_SHADOW_END_NR].start_address = KASAN_SHADOW_END;
 #endif
+   address_markers[MODULES_VADDR_NR].start_address = MODULES_

[Xen-devel] [PATCH v2 11/27] x86/power/64: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S 
b/arch/x86/power/hibernate_asm_64.S
index ce8da3a0412c..6fdd7bbc3c33 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -24,7 +24,7 @@
 #include 
 
 ENTRY(swsusp_arch_suspend)
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movq%rsp, pt_regs_sp(%rax)
movq%rbp, pt_regs_bp(%rax)
movq%rsi, pt_regs_si(%rax)
@@ -115,7 +115,7 @@ ENTRY(restore_registers)
movq%rax, %cr4;  # turn PGE back on
 
/* We don't restore %rax, it must be 0 anyway */
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movqpt_regs_sp(%rax), %rsp
movqpt_regs_bp(%rax), %rbp
movqpt_regs_si(%rax), %rsi
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 13/27] x86/boot/64: Build head64.c as mcmodel large when PIE is enabled

2018-03-13 Thread Thomas Garnier
The __startup_64 function assumes all symbols have relocated addresses
instead of the current boot virtual address. PIE generated code favor
relative addresses making all virtual and physical address math incorrect.
If PIE is enabled, build head64.c as mcmodel large instead to ensure absolute
references on all memory access. Add a global __force_order variable required
when using a large model with read_cr* functions.

To build head64.c as mcmodel=large, disable the retpoline gcc flags.
This code is used at early boot and removed later, it doesn't need
retpoline mitigation.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/Makefile | 6 ++
 arch/x86/kernel/head64.c | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 29786c87e864..1ff6be34de66 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -22,6 +22,12 @@ CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 endif
 
+ifdef CONFIG_X86_PIE
+# Remove PIE and retpoline flags that are incompatible with mcmodel=large
+CFLAGS_REMOVE_head64.o += -fPIE -mindirect-branch=thunk-extern 
-mindirect-branch-register
+CFLAGS_head64.o = -mcmodel=large
+endif
+
 KASAN_SANITIZE_head$(BITS).o   := n
 KASAN_SANITIZE_dumpstack.o := n
 KASAN_SANITIZE_dumpstack_$(BITS).o := n
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 0c855deee165..2fe60e661227 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -64,6 +64,9 @@ EXPORT_SYMBOL(vmemmap_base);
 
 #define __head __section(.head.text)
 
+/* Required for read_cr3 when building as PIE */
+unsigned long __force_order;
+
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
return ptr - (void *)_text + (void *)physaddr;
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 17/27] x86/relocs: Handle PIE relocations

2018-03-13 Thread Thomas Garnier
Change the relocation tool to correctly handle relocations generated by
-fPIE option:

 - Add relocation for each entry of the .got section given the linker does not
   generate R_X86_64_GLOB_DAT on a simple link.
 - Ignore R_X86_64_GOTPCREL.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/tools/relocs.c | 93 -
 1 file changed, 92 insertions(+), 1 deletion(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 220e97841e49..a35cc337f883 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -32,6 +32,7 @@ struct section {
Elf_Sym*symtab;
Elf_Rel*reltab;
char   *strtab;
+   Elf_Addr   *got;
 };
 static struct section *secs;
 
@@ -293,6 +294,35 @@ static Elf_Sym *sym_lookup(const char *symname)
return 0;
 }
 
+static Elf_Sym *sym_lookup_addr(Elf_Addr addr, const char **name)
+{
+   int i;
+   for (i = 0; i < ehdr.e_shnum; i++) {
+   struct section *sec = [i];
+   long nsyms;
+   Elf_Sym *symtab;
+   Elf_Sym *sym;
+
+   if (sec->shdr.sh_type != SHT_SYMTAB)
+   continue;
+
+   nsyms = sec->shdr.sh_size/sizeof(Elf_Sym);
+   symtab = sec->symtab;
+
+   for (sym = symtab; --nsyms >= 0; sym++) {
+   if (sym->st_value == addr) {
+   if (name) {
+   *name = sym_name(sec->link->strtab,
+sym);
+   }
+   return sym;
+   }
+   }
+   }
+   return 0;
+}
+
+
 #if BYTE_ORDER == LITTLE_ENDIAN
 #define le16_to_cpu(val) (val)
 #define le32_to_cpu(val) (val)
@@ -513,6 +543,33 @@ static void read_relocs(FILE *fp)
}
 }
 
+static void read_got(FILE *fp)
+{
+   int i;
+   for (i = 0; i < ehdr.e_shnum; i++) {
+   struct section *sec = [i];
+   sec->got = NULL;
+   if (sec->shdr.sh_type != SHT_PROGBITS ||
+   strcmp(sec_name(i), ".got")) {
+   continue;
+   }
+   sec->got = malloc(sec->shdr.sh_size);
+   if (!sec->got) {
+   die("malloc of %d bytes for got failed\n",
+   sec->shdr.sh_size);
+   }
+   if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+   die("Seek to %d failed: %s\n",
+   sec->shdr.sh_offset, strerror(errno));
+   }
+   if (fread(sec->got, 1, sec->shdr.sh_size, fp)
+   != sec->shdr.sh_size) {
+   die("Cannot read got: %s\n",
+   strerror(errno));
+   }
+   }
+}
+
 
 static void print_absolute_symbols(void)
 {
@@ -643,6 +700,32 @@ static void add_reloc(struct relocs *r, uint32_t offset)
r->offset[r->count++] = offset;
 }
 
+/*
+ * The linker does not generate relocations for the GOT for the kernel.
+ * If a GOT is found, simulate the relocations that should have been included.
+ */
+static void walk_got_table(int (*process)(struct section *sec, Elf_Rel *rel,
+ Elf_Sym *sym, const char *symname),
+  struct section *sec)
+{
+   int i;
+   Elf_Addr entry;
+   Elf_Sym *sym;
+   const char *symname;
+   Elf_Rel rel;
+
+   for (i = 0; i < sec->shdr.sh_size/sizeof(Elf_Addr); i++) {
+   entry = sec->got[i];
+   sym = sym_lookup_addr(entry, );
+   if (!sym)
+   die("Could not found got symbol for entry %d\n", i);
+   rel.r_offset = sec->shdr.sh_addr + i * sizeof(Elf_Addr);
+   rel.r_info = ELF_BITS == 64 ? R_X86_64_GLOB_DAT
+: R_386_GLOB_DAT;
+   process(sec, , sym, symname);
+   }
+}
+
 static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
Elf_Sym *sym, const char *symname))
 {
@@ -656,6 +739,8 @@ static void walk_relocs(int (*process)(struct section *sec, 
Elf_Rel *rel,
struct section *sec = [i];
 
if (sec->shdr.sh_type != SHT_REL_TYPE) {
+   if (sec->got)
+   walk_got_table(process, sec);
continue;
}
sec_symtab  = sec->link;
@@ -765,6 +850,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, 
ElfW(Sym) *sym,
offset += per_cpu_load_addr;
 
switch (r_type) {
+   case R_X86_64_GOTPCREL:
case R_X86

[Xen-devel] [PATCH v2 10/27] x86/boot/64: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Early at boot, the kernel is mapped at a temporary address while preparing
the page table. To know the changes needed for the page table with KASLR,
the boot code calculate the difference between the expected address of the
kernel and the one chosen by KASLR. It does not work with PIE because all
symbols in code are relatives. Instead of getting the future relocated
virtual address, you will get the current temporary mapping. The solution
is using global variables that will be relocated as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/head_64.S | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 48385c1074a5..48652f3ec46a 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -89,8 +89,9 @@ startup_64:
popq%rsi
 
/* Form the CR3 value being sure to include the CR3 modifier */
-   addq$(early_top_pgt - __START_KERNEL_map), %rax
+   addq_early_top_pgt_offset(%rip), %rax
jmp 1f
+
 ENTRY(secondary_startup_64)
UNWIND_HINT_EMPTY
/*
@@ -119,7 +120,7 @@ ENTRY(secondary_startup_64)
popq%rsi
 
/* Form the CR3 value being sure to include the CR3 modifier */
-   addq$(init_top_pgt - __START_KERNEL_map), %rax
+   addq_init_top_offset(%rip), %rax
 1:
 
/* Enable PAE mode, PGE and LA57 */
@@ -137,7 +138,7 @@ ENTRY(secondary_startup_64)
movq%rax, %cr3
 
/* Ensure I am executing from virtual addresses */
-   movq$1f, %rax
+   movabs  $1f, %rax
ANNOTATE_RETPOLINE_SAFE
jmp *%rax
 1:
@@ -234,11 +235,12 @@ ENTRY(secondary_startup_64)
 *  REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 *  address given in m16:64.
 */
-   pushq   $.Lafter_lret   # put return address on stack for unwinder
+   leaq.Lafter_lret(%rip), %rax
+   pushq   %rax# put return address on stack for unwinder
xorq%rbp, %rbp  # clear frame pointer
-   movqinitial_code(%rip), %rax
+   leaqinitial_code(%rip), %rax
pushq   $__KERNEL_CS# set correct cs
-   pushq   %rax# target address in negative space
+   pushq   (%rax)  # target address in negative space
lretq
 .Lafter_lret:
 END(secondary_startup_64)
@@ -342,6 +344,18 @@ END(early_idt_handler_common)
 GLOBAL(early_recursion_flag)
.long 0
 
+   /*
+* Position Independent Code takes only relative references in code
+* meaning a global variable address is relative to RIP and not its
+* future virtual address. Global variables can be used instead as they
+* are still relocated on the expected kernel mapping address.
+*/
+   .align 8
+_early_top_pgt_offset:
+   .quad early_top_pgt - __START_KERNEL_map
+_init_top_offset:
+   .quad init_top_pgt - __START_KERNEL_map
+
 #define NEXT_PAGE(name) \
.balign PAGE_SIZE; \
 GLOBAL(name)
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 09/27] x86/acpi: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/acpi/wakeup_64.S | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 50b8ed0317a3..472659c0f811 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
 * Hooray, we are in Long 64-bit mode (but still running in low memory)
 */
 ENTRY(wakeup_long64)
-   movqsaved_magic, %rax
+   movqsaved_magic(%rip), %rax
movq$0x123456789abcdef0, %rdx
cmpq%rdx, %rax
jne bogus_64_magic
@@ -25,14 +25,14 @@ ENTRY(wakeup_long64)
movw%ax, %es
movw%ax, %fs
movw%ax, %gs
-   movqsaved_rsp, %rsp
+   movqsaved_rsp(%rip), %rsp
 
-   movqsaved_rbx, %rbx
-   movqsaved_rdi, %rdi
-   movqsaved_rsi, %rsi
-   movqsaved_rbp, %rbp
+   movqsaved_rbx(%rip), %rbx
+   movqsaved_rdi(%rip), %rdi
+   movqsaved_rsi(%rip), %rsi
+   movqsaved_rbp(%rip), %rbp
 
-   movqsaved_rip, %rax
+   movqsaved_rip(%rip), %rax
jmp *%rax
 ENDPROC(wakeup_long64)
 
@@ -45,7 +45,7 @@ ENTRY(do_suspend_lowlevel)
xorl%eax, %eax
callsave_processor_state
 
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movq%rsp, pt_regs_sp(%rax)
movq%rbp, pt_regs_bp(%rax)
movq%rsi, pt_regs_si(%rax)
@@ -64,13 +64,14 @@ ENTRY(do_suspend_lowlevel)
pushfq
popqpt_regs_flags(%rax)
 
-   movq$.Lresume_point, saved_rip(%rip)
+   leaq.Lresume_point(%rip), %rax
+   movq%rax, saved_rip(%rip)
 
-   movq%rsp, saved_rsp
-   movq%rbp, saved_rbp
-   movq%rbx, saved_rbx
-   movq%rdi, saved_rdi
-   movq%rsi, saved_rsi
+   movq%rsp, saved_rsp(%rip)
+   movq%rbp, saved_rbp(%rip)
+   movq%rbx, saved_rbx(%rip)
+   movq%rdi, saved_rdi(%rip)
+   movq%rsi, saved_rsi(%rip)
 
addq$8, %rsp
movl$3, %edi
@@ -82,7 +83,7 @@ ENTRY(do_suspend_lowlevel)
.align 4
 .Lresume_point:
/* We don't restore %rax, it must be 0 anyway */
-   movq$saved_context, %rax
+   leaqsaved_context(%rip), %rax
movqsaved_context_cr4(%rax), %rbx
movq%rbx, %cr4
movqsaved_context_cr3(%rax), %rbx
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 12/27] x86/paravirt: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
if PIE is enabled, switch the paravirt assembly constraints to be
compatible. The %c/i constrains generate smaller code so is kept by
default.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/paravirt_types.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 180bc0bff0fb..140747a98d94 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -337,9 +337,17 @@ extern struct pv_lock_ops pv_lock_ops;
 #define PARAVIRT_PATCH(x)  \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
 
+#ifdef CONFIG_X86_PIE
+#define paravirt_opptr_call "a"
+#define paravirt_opptr_type "p"
+#else
+#define paravirt_opptr_call "c"
+#define paravirt_opptr_type "i"
+#endif
+
 #define paravirt_type(op)  \
[paravirt_typenum] "i" (PARAVIRT_PATCH(op)),\
-   [paravirt_opptr] "i" (&(op))
+   [paravirt_opptr] paravirt_opptr_type (&(op))
 #define paravirt_clobber(clobber)  \
[paravirt_clobber] "i" (clobber)
 
@@ -395,7 +403,7 @@ int paravirt_disable_iospace(void);
  */
 #define PARAVIRT_CALL  \
ANNOTATE_RETPOLINE_SAFE \
-   "call *%c[paravirt_opptr];"
+   "call *%" paravirt_opptr_call "[paravirt_opptr];"
 
 /*
  * These macros are intended to wrap calls through one of the paravirt
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 06/27] x86/entry/64: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/entry/entry_64.S| 16 ++--
 arch/x86/kernel/relocate_kernel_64.S |  8 +++-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bd53c57617e6..c53123468364 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -191,7 +191,7 @@ ENTRY(entry_SYSCALL_64_trampoline)
 * spill RDI and restore it in a second-stage trampoline.
 */
pushq   %rdi
-   movq$entry_SYSCALL_64_stage2, %rdi
+   movabsq $entry_SYSCALL_64_stage2, %rdi
JMP_NOSPEC %rdi
 END(entry_SYSCALL_64_trampoline)
 
@@ -1275,7 +1275,8 @@ ENTRY(error_entry)
movl%ecx, %eax  /* zero extend */
cmpq%rax, RIP+8(%rsp)
je  .Lbstep_iret
-   cmpq$.Lgs_change, RIP+8(%rsp)
+   leaq.Lgs_change(%rip), %rcx
+   cmpq%rcx, RIP+8(%rsp)
jne .Lerror_entry_done
 
/*
@@ -1480,10 +1481,10 @@ ENTRY(nmi)
 * resume the outer NMI.
 */
 
-   movq$repeat_nmi, %rdx
+   leaqrepeat_nmi(%rip), %rdx
cmpq8(%rsp), %rdx
ja  1f
-   movq$end_repeat_nmi, %rdx
+   leaqend_repeat_nmi(%rip), %rdx
cmpq8(%rsp), %rdx
ja  nested_nmi_out
 1:
@@ -1537,7 +1538,8 @@ nested_nmi:
pushq   %rdx
pushfq
pushq   $__KERNEL_CS
-   pushq   $repeat_nmi
+   leaqrepeat_nmi(%rip), %rdx
+   pushq   %rdx
 
/* Put stack back */
addq$(6*8), %rsp
@@ -1576,7 +1578,9 @@ first_nmi:
addq$8, (%rsp)  /* Fix up RSP */
pushfq  /* RFLAGS */
pushq   $__KERNEL_CS/* CS */
-   pushq   $1f /* RIP */
+   pushq   %rax/* Support Position Independent Code */
+   leaq1f(%rip), %rax  /* RIP */
+   xchgq   %rax, (%rsp)/* Restore RAX, put 1f */
iretq   /* continues at repeat_nmi below */
UNWIND_HINT_IRET_REGS
 1:
diff --git a/arch/x86/kernel/relocate_kernel_64.S 
b/arch/x86/kernel/relocate_kernel_64.S
index a7227dfe1a2b..0c0fc259a4e2 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -208,11 +208,9 @@ identity_mapped:
movq%rax, %cr3
lea PAGE_SIZE(%r8), %rsp
callswap_pages
-   jmp *virtual_mapped_addr(%rip)
-
-   /* Absolute value for PIE support */
-virtual_mapped_addr:
-   .quad virtual_mapped
+   movabsq $virtual_mapped, %rax
+   pushq   %rax
+   ret
 
 virtual_mapped:
movqRSP(%r8), %rsp
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 03/27] x86: Use symbol name in jump table for PIE support

2018-03-13 Thread Thomas Garnier
Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/jump_label.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h 
b/arch/x86/include/asm/jump_label.h
index 8c0de4282659..dfdcdc39604a 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -37,9 +37,9 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
_ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   _ASM_PTR "1b, %l[l_yes], %P0 \n\t"
".popsection \n\t"
-   : :  "i" (key), "i" (branch) : : l_yes);
+   : :  "X" (&((char *)key)[branch]) : : l_yes);
 
return false;
 l_yes:
@@ -53,9 +53,9 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
"2:\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
_ASM_ALIGN "\n\t"
-   _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+   _ASM_PTR "1b, %l[l_yes], %P0 \n\t"
".popsection \n\t"
-   : :  "i" (key), "i" (branch) : : l_yes);
+   : :  "X" (&((char *)key)[branch]) : : l_yes);
 
return false;
 l_yes:
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 16/27] compiler: Option to add PROVIDE_HIDDEN replacement for weak symbols

2018-03-13 Thread Thomas Garnier
Provide an option to have a PROVIDE_HIDDEN (linker script) entry for
each weak symbol. This option solve an error in x86_64 where the linker
optimizes pie generate code to be non-pie because --emit-relocs was used
instead of -pie (to reduce dynamic relocations).

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 init/Kconfig|  7 +++
 scripts/link-vmlinux.sh | 14 ++
 2 files changed, 21 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index c924babc6d47..fe9f9ada4db0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1927,6 +1927,13 @@ config ASN1
  inform it as to what tags are to be expected in a stream and what
  functions to call on what tags.
 
+config WEAK_PROVIDE_HIDDEN
+   bool
+   help
+ Generate linker script PROVIDE_HIDDEN entries for all weak symbols. It
+ allows to prevent non-pie code being replaced by the linker if the
+ emit-relocs option is used instead of pie (useful for x86_64 pie).
+
 source "kernel/Kconfig.locks"
 
 config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 08ca08e9105c..c015f5142ecf 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -146,6 +146,17 @@ kallsyms()
${CC} ${aflags} -c -o ${2} ${afile}
 }
 
+gen_weak_provide_hidden()
+{
+if [ -n "${CONFIG_WEAK_PROVIDE_HIDDEN}" ]; then
+local pattern="s/^\s\+ w \(\w\+\)$/PROVIDE_HIDDEN(\1 = .);/gp"
+echo -e "SECTIONS {\n. = _end;" > .tmp_vmlinux_hiddenld
+${NM} ${1} | sed -n "${pattern}" >> .tmp_vmlinux_hiddenld
+echo "}" >> .tmp_vmlinux_hiddenld
+LDFLAGS_vmlinux="${LDFLAGS_vmlinux} -T .tmp_vmlinux_hiddenld"
+fi
+}
+
 # Create map file with all symbols from ${1}
 # See mksymap for additional details
 mksysmap()
@@ -230,6 +241,9 @@ modpost_link vmlinux.o
 # modpost vmlinux.o to check for section mismatches
 ${MAKE} -f "${srctree}/scripts/Makefile.modpost" vmlinux.o
 
+# Generate weak linker script
+gen_weak_provide_hidden vmlinux.o
+
 kallsymso=""
 kallsyms_vmlinux=""
 if [ -n "${CONFIG_KALLSYMS}" ]; then
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 07/27] x86: pm-trace - Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change assembly to use the new _ASM_MOVABS macro instead of _ASM_MOV for
the assembly to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/pm-trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index bfa32aa428e5..972070806ce9 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -8,7 +8,7 @@
 do {   \
if (pm_trace_enabled) { \
const void *tracedata;  \
-   asm volatile(_ASM_MOV " $1f,%0\n"   \
+   asm volatile(_ASM_MOVABS " $1f,%0\n"\
 ".section .tracedata,\"a\"\n"  \
 "1:\t.word %c1\n\t"\
 _ASM_PTR " %c2\n"  \
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 15/27] compiler: Option to default to hidden symbols

2018-03-13 Thread Thomas Garnier
Provide an option to default visibility to hidden except for key
symbols. This option is disabled by default and will be used by x86_64
PIE support to remove errors between compilation units.

The default visibility is also enabled for external symbols that are
compared as they maybe equals (start/end of sections). In this case,
older versions of GCC will remove the comparison if the symbols are
hidden. This issue exists at least on gcc 4.9 and before.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/boot/boot.h |  2 +-
 arch/x86/include/asm/setup.h |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  4 ++--
 drivers/base/firmware_class.c|  4 ++--
 include/asm-generic/sections.h   |  6 ++
 include/linux/compiler.h |  7 +++
 init/Kconfig |  7 +++
 kernel/kallsyms.c| 16 
 kernel/trace/trace.h |  4 ++--
 lib/dynamic_debug.c  |  4 ++--
 10 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef5a9cc66fb8..d726c35bdd96 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -193,7 +193,7 @@ static inline bool memcmp_gs(const void *s1, addr_t s2, 
size_t len)
 }
 
 /* Heap -- available for dynamic lists. */
-extern char _end[];
+extern char _end[] __default_visibility;
 extern char *HEAP;
 extern char *heap_end;
 #define RESET_HEAP() ((void *)( HEAP = _end ))
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 3108e297d87d..dfba64fe1c7e 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -70,7 +70,7 @@ static inline void x86_ce4100_early_setup(void) { }
  * This is set up by the setup-routine at boot-time
  */
 extern struct boot_params boot_params;
-extern char _text[];
+extern char _text[] __default_visibility;
 
 static inline bool kaslr_enabled(void)
 {
diff --git a/arch/x86/kernel/cpu/microcode/core.c 
b/arch/x86/kernel/cpu/microcode/core.c
index aa1b9a422f2b..ed5675db6e82 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -141,8 +141,8 @@ static bool __init check_loader_disabled_bsp(void)
return *res;
 }
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 bool get_builtin_firmware(struct cpio_data *cd, const char *name)
 {
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 7dd36ace6152..939a1952d0ab 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -136,8 +136,8 @@ static struct firmware_cache fw_cache;
 
 #ifdef CONFIG_FW_LOADER
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 static void fw_copy_to_prealloc_buf(struct firmware *fw,
void *buf, size_t size)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 849cd8eb5ca0..0a0e23405ddd 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -32,6 +32,9 @@
  * __softirqentry_text_start, __softirqentry_text_end
  * __start_opd, __end_opd
  */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(default)
+#endif
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
 extern char __bss_start[], __bss_stop[];
@@ -49,6 +52,9 @@ extern char __start_once[], __end_once[];
 
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility pop
+#endif
 
 /* Start and end of .opd section - used for function descriptors. */
 extern char __start_opd[], __end_opd[];
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..a9ac84e37af9 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -278,6 +278,13 @@ unsigned long read_word_at_a_time(const void *addr)
__u.__val;  \
 })
 
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(hidden)
+#define __default_visibility  __attribute__((visibility ("default")))
+#else
+#define __default_visibility
+#endif
+
 #endif /* __KERNEL__ */
 
 #endif /* __ASSEMBLY__ */
diff --git a/init/Kconfig b/init/Kconfig
index acc9087546ac..c924babc6d47 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1667,6 +1667,13 @@ config PROFILING
 config TRACEPOINTS
bool
 
+#
+# Default to hidden visibility for all symbols.
+# Useful for Position Independent Code to reduce global references.
+#
+config DEFAULT_HIDDEN
+   bool
+
 sour

[Xen-devel] [PATCH v2 05/27] x86: relocate_kernel - Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S 
b/arch/x86/kernel/relocate_kernel_64.S
index 11eda21eb697..a7227dfe1a2b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -208,9 +208,11 @@ identity_mapped:
movq%rax, %cr3
lea PAGE_SIZE(%r8), %rsp
callswap_pages
-   movq$virtual_mapped, %rax
-   pushq   %rax
-   ret
+   jmp *virtual_mapped_addr(%rip)
+
+   /* Absolute value for PIE support */
+virtual_mapped_addr:
+   .quad virtual_mapped
 
 virtual_mapped:
movqRSP(%r8), %rsp
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 01/27] x86/crypto: Adapt assembly for PIE support

2018-03-13 Thread Thomas Garnier
Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/crypto/aes-x86_64-asm_64.S  | 45 +
 arch/x86/crypto/aesni-intel_asm.S|  8 +-
 arch/x86/crypto/aesni-intel_avx-x86_64.S |  6 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 42 -
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 -
 arch/x86/crypto/camellia-x86_64-asm_64.S |  8 +-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S| 50 +-
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S| 44 +
 arch/x86/crypto/des3_ede-asm_64.S| 96 +---
 arch/x86/crypto/ghash-clmulni-intel_asm.S|  4 +-
 arch/x86/crypto/glue_helper-asm-avx.S|  4 +-
 arch/x86/crypto/glue_helper-asm-avx2.S   |  6 +-
 arch/x86/crypto/sha256-avx2-asm.S| 23 +++--
 13 files changed, 221 insertions(+), 159 deletions(-)

diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S 
b/arch/x86/crypto/aes-x86_64-asm_64.S
index 8739cf7795de..86fa068e5e81 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -48,8 +48,12 @@
 #define R10%r10
 #define R11%r11
 
+/* Hold global for PIE suport */
+#define RBASE  %r12
+
 #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
ENTRY(FUNC);\
+   pushq   RBASE;  \
movqr1,r2;  \
leaqKEY+48(r8),r9;  \
movqr10,r11;\
@@ -74,54 +78,63 @@
movlr6 ## E,4(r9);  \
movlr7 ## E,8(r9);  \
movlr8 ## E,12(r9); \
+   popqRBASE;  \
ret;\
ENDPROC(FUNC);
 
+#define round_mov(tab_off, reg_i, reg_o) \
+   leaqtab_off(%rip), RBASE; \
+   movl(RBASE,reg_i,4), reg_o;
+
+#define round_xor(tab_off, reg_i, reg_o) \
+   leaqtab_off(%rip), RBASE; \
+   xorl(RBASE,reg_i,4), reg_o;
+
 #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
movzbl  r2 ## H,r5 ## E;\
movzbl  r2 ## L,r6 ## E;\
-   movlTAB+1024(,r5,4),r5 ## E;\
+   round_mov(TAB+1024, r5, r5 ## E)\
movwr4 ## X,r2 ## X;\
-   movlTAB(,r6,4),r6 ## E; \
+   round_mov(TAB, r6, r6 ## E) \
roll$16,r2 ## E;\
shrl$16,r4 ## E;\
movzbl  r4 ## L,r7 ## E;\
movzbl  r4 ## H,r4 ## E;\
xorlOFFSET(r8),ra ## E; \
xorlOFFSET+4(r8),rb ## E;   \
-   xorlTAB+3072(,r4,4),r5 ## E;\
-   xorlTAB+2048(,r7,4),r6 ## E;\
+   round_xor(TAB+3072, r4, r5 ## E)\
+   round_xor(TAB+2048, r7, r6 ## E)\
movzbl  r1 ## L,r7 ## E;\
movzbl  r1 ## H,r4 ## E;\
-   movlTAB+1024(,r4,4),r4 ## E;\
+   round_mov(TAB+1024, r4, r4 ## E)\
movwr3 ## X,r1 ## X;\
roll$16,r1 ## E;\
shrl$16,r3 ## E;\
-   xorlTAB(,r7,4),r5 ## E; \
+   round_xor(TAB, r7, r5 ## E) \
movzbl  r3 ## L,r7 ## E;\
movzbl  r3 ## H,r3 ## E;\
-   xorlTAB+3072(,r3,4),r4 ## E;\
-   xorlTAB+2048(,r7,4),r5 ## E;\
+   round_xor(TAB+3072, r3, r4 ## E)\
+   round_xor(TAB+2048, r7, r5 ## E)\
movzbl  r1 ## L,r7 ## E;\
movzbl  r1 ## H,r3 ## E;\
shrl$16,r1 ## E;\
-   xorlTAB+3072(,r3,4),r6 ## E;\
-   movlTAB+2048(,r7,4),r3 ## E;\
+   round_xor(TAB+3072, r3, r6 ## E)\
+   round_mov(TAB+2048, r7, r3 ## E)\
movzbl  r1 ## L,r7 ## E;\
movzbl  r1 ## H,r1 ## E;\
-   xorlTAB+1024(,r1,4),r6 ## E;\
-   xorlTAB(,r7,4),r3 ## E; \
+   round_xor(TAB+1024, r1, r6 ## E)\
+   round_xor(TAB, r7, r3 ## E) \
movzbl  r2 ## H,r1 ## E;\
movzbl  r2 ## L,r7 ## E;\
shrl$16,r2 ## E;\
-   xorlTAB+3072(,r1,4),r3 ## E;\
-   xorlTAB+2048(,r7,4),r4 ## E;\
+   round_xor(TAB+3072, r1, r3 ## E)\
+   round_xor(TAB+2048, r7, r4 ## E)\
movzbl  r2 ## H,r1 ## E;\
movzbl  r2 ## L,r2 ## E;\
xorlOFFSET+8(r8),rc ## E;   \
xorlOFFSET+12(r8),rd ## E;  \
-   xorlTAB+1024(,r1,4),r3 ## E;\
-   xorlTAB(,r2,4),r4 ## E;
+   round_xor(TAB+1024, r1, r3 ## E)\
+   round_xor(TAB, r2, r4 ## E)
 
 #define move_regs(r1,r2,r3,r4) \
movlr3 ## E,r1 ## E;\
diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index e762ef417562..4df029aa5fc1 

[Xen-devel] [PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018-03-13 Thread Thomas Garnier
Changes:
 - patch v2:
   - Adapt patch to work post KPTI and compiler changes
   - Redo all performance testing with latest configs and compilers
   - Simplify mov macro on PIE (MOVABS now)
   - Reduce GOT footprint
 - patch v1:
   - Simplify ftrace implementation.
   - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
 - rfc v3:
   - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
 mapped memory. It also simplifies the relocation process.
   - Move the start the module section next to the kernel. Remove the need for
 -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
   - Support for XEN PVH as 32-bit relocations can be ignored with
 --emit-relocs.
   - Support for GOT relocations previously done automatically with -pie.
   - Remove need for dynamic PLT in modules.
   - Support dymamic GOT for modules.
 - rfc v2:
   - Add support for global stack cookie while compiler default to fs without
 mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
 preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-13, 18-19: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
   It removes errors between compilation units.
 - 16: Add PROVIDE_HIDDEN replacement on the linker script for weak symbols to
   reduce GOT footprint.
 - 17: Adapt relocation tool to handle PIE binary correctly.
 - 20: Add support for global cookie.
 - 21: Support ftrace with PIE (used on Ubuntu config).
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
   from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.18%
 - PIE enabled: -1.977% (less relocations)
 .text section:
 - PIE disabled: same
 - PIE enabled: same

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: +0.21%
 - PIE enabled: +10%
 .text section:
 - PIE disabled: same
 - PIE enabled: +0.001%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg -/+ 0.5% on latest test).
 - PIE enabled: between -1% to +1% in average (default and Ubuntu config).

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.5%)
 - PIE enabled: average -0.5% to +0.5%
 System Time:
 - PIE disabled: no significant change (avg -0.1%)
 - PIE enabled: average -0.4% to +0.4%.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt  |3 
 arch/x86/Kconfig |   45 ++
 arch/x86/Makefile|   58 
 arch/x86/boot/boot.h |2 
 arch/x86/boot/compressed/Makefile|5 
 arch/x86/boot/compressed/misc.c  |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S  |   45 --
 arch/x86/crypto/aesni-intel_asm.S|8 -
 arch/x86/crypto/aesni-intel_avx-x86_64.S |6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 +++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S |8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S|   50 ---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S|   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S|   96 +-
 arch/x86/crypto/ghash-clmulni-intel_asm.S|4 
 arch/x86/crypto/glue_helper-asm-avx.S|4 
 arch/x86/crypto/glue_helper-asm-avx2.S   |6 
 arch/x86/crypto/sha256-avx2-asm.S|   23 ++-
 arch/x86/entry/calling.h |2 
 arch/x86/entry/entry_32.S|3 
 

[Xen-devel] [PATCH v2 04/27] x86: Add macro to get symbol address for PIE support

2018-03-13 Thread Thomas Garnier
Add a new _ASM_MOVABS macro to fetch a symbol address. It will be used
to replace "_ASM_MOV $, %dst" code construct that are not compatible
with PIE.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/asm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 386a6900e206..653d8b82f015 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -30,6 +30,7 @@
 #define _ASM_ALIGN __ASM_SEL(.balign 4, .balign 8)
 
 #define _ASM_MOV   __ASM_SIZE(mov)
+#define _ASM_MOVABS__ASM_SEL(movl, movabsq)
 #define _ASM_INC   __ASM_SIZE(inc)
 #define _ASM_DEC   __ASM_SIZE(dec)
 #define _ASM_ADD   __ASM_SIZE(add)
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 02/27] x86: Use symbol name on bug table for PIE support

2018-03-13 Thread Thomas Garnier
Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgar...@google.com>
---
 arch/x86/include/asm/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 6804d6642767..3d690a4abf50 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -35,7 +35,7 @@ do {  
\
asm volatile("1:\t" ins "\n"\
 ".pushsection __bug_table,\"aw\"\n"\
 "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"   \
-"\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"   \
+"\t"  __BUG_REL(%P0) "\t# bug_entry::file\n"   \
 "\t.word %c1""\t# bug_entry::line\n"   \
 "\t.word %c2""\t# bug_entry::flags\n"  \
 "\t.org 2b+%c3\n"  \
-- 
2.16.2.660.g709887971b-goog


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel