Re: [RFC PATCH] compiler.h: Introduce ptr_eq() to preserve address dependency
On Fri, Sep 27, 2024 at 07:05:53PM -0400, Mathieu Desnoyers wrote: > On 2024-09-27 22:33, Mathieu Desnoyers wrote: > [...] > > > --- > > include/linux/compiler.h | 62 > > 1 file changed, 62 insertions(+) > > > > I'm wondering if this really belongs in compiler.h, or if it's so > RCU/HP specific that it should be implemented in rcupdate.h ? > > [... ] > > +static __always_inline > > +int ptr_eq(const volatile void *a, const volatile void *b) > > And perhaps rename this to rcu_ptr_eq() ? For either location or name: Acked-by: Paul E. McKenney There are non-RCU uses, but RCU is currently by far the most common. Thanx, Paul > Thanks, > > Mathieu > > > +{ > > + OPTIMIZER_HIDE_VAR(a); > > + OPTIMIZER_HIDE_VAR(b); > > + return a == b; > > +} > > + > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >
Re: [RFC PATCH] compiler.h: Introduce ptr_eq() to preserve address dependency
On Fri, Sep 27, 2024 at 07:05:53PM -0400, Mathieu Desnoyers wrote: > On 2024-09-27 22:33, Mathieu Desnoyers wrote: > [...] > > > --- > > include/linux/compiler.h | 62 > > 1 file changed, 62 insertions(+) > > > > I'm wondering if this really belongs in compiler.h, or if it's so > RCU/HP specific that it should be implemented in rcupdate.h ? > > [... ] > > +static __always_inline > > +int ptr_eq(const volatile void *a, const volatile void *b) > > And perhaps rename this to rcu_ptr_eq() ? > I think the current place and name is fine, yes RCU is the most important address dependency user, but there are other users as well. Regards, Boqun > Thanks, > > Mathieu > > > +{ > > + OPTIMIZER_HIDE_VAR(a); > > + OPTIMIZER_HIDE_VAR(b); > > + return a == b; > > +} > > + > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >
Re: [RFC PATCH] compiler.h: Introduce ptr_eq() to preserve address dependency
[Cc Gary] On Fri, Sep 27, 2024 at 04:33:34PM -0400, Mathieu Desnoyers wrote: > Compiler CSE and SSA GVN optimizations can cause the address dependency > of addresses returned by rcu_dereference to be lost when comparing those > pointers with either constants or previously loaded pointers. > > Introduce ptr_eq() to compare two addresses while preserving the address > dependencies for later use of the address. It should be used when > comparing an address returned by rcu_dereference(). > > This is needed to prevent the compiler CSE and SSA GVN optimizations > from replacing the registers holding @a or @b based on their > equality, which does not preserve address dependencies and allows the > following misordering speculations: > > - If @b is a constant, the compiler can issue the loads which depend > on @a before loading @a. > - If @b is a register populated by a prior load, weakly-ordered > CPUs can speculate loads which depend on @a before loading @a. > > The same logic applies with @a and @b swapped. > > The compiler barrier() is ineffective at fixing this issue. > It does not prevent the compiler CSE from losing the address dependency: > > int fct_2_volatile_barriers(void) > { > int *a, *b; > > do { > a = READ_ONCE(p); > asm volatile ("" : : : "memory"); > b = READ_ONCE(p); > } while (a != b); > asm volatile ("" : : : "memory"); <- barrier() > return *b; > } > > With gcc 14.2 (arm64): > > fct_2_volatile_barriers: > adrpx0, .LANCHOR0 > add x0, x0, :lo12:.LANCHOR0 > .L2: > ldr x1, [x0]<-- x1 populated by first load. > ldr x2, [x0] > cmp x1, x2 > bne .L2 > ldr w0, [x1]<-- x1 is used for access which should depend > on b. > ret > > On weakly-ordered architectures, this lets CPU speculation use the > result from the first load to speculate "ldr w0, [x1]" before > "ldr x2, [x0]". > Based on the RCU documentation, the control dependency does not prevent > the CPU from speculating loads. > > Suggested-by: Linus Torvalds > Suggested-by: Boqun Feng > Signed-off-by: Mathieu Desnoyers > Cc: Greg Kroah-Hartman > Cc: Sebastian Andrzej Siewior > Cc: "Paul E. McKenney" > Cc: Will Deacon > Cc: Peter Zijlstra > Cc: Boqun Feng > Cc: Alan Stern > Cc: John Stultz > Cc: Neeraj Upadhyay > Cc: Linus Torvalds > Cc: Boqun Feng > Cc: Frederic Weisbecker > Cc: Joel Fernandes > Cc: Josh Triplett > Cc: Uladzislau Rezki > Cc: Steven Rostedt > Cc: Lai Jiangshan > Cc: Zqiang > Cc: Ingo Molnar > Cc: Waiman Long > Cc: Mark Rutland > Cc: Thomas Gleixner > Cc: Vlastimil Babka > Cc: maged.mich...@gmail.com > Cc: Mateusz Guzik > Cc: r...@vger.kernel.org > Cc: linux...@kvack.org > Cc: l...@lists.linux.dev > --- > include/linux/compiler.h | 62 > 1 file changed, 62 insertions(+) > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 2df665fa2964..f26705c267e8 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, > int val, > __asm__ ("" : "=r" (var) : "0" (var)) > #endif > > +/* > + * Compare two addresses while preserving the address dependencies for > + * later use of the address. It should be used when comparing an address > + * returned by rcu_dereference(). > + * > + * This is needed to prevent the compiler CSE and SSA GVN optimizations > + * from replacing the registers holding @a or @b based on their > + * equality, which does not preserve address dependencies and allows the > + * following misordering speculations: > + * > + * - If @b is a constant, the compiler can issue the loads which depend > + * on @a before loading @a. > + * - If @b is a register populated by a prior load, weakly-ordered > + * CPUs can speculate loads which depend on @a before loading @a. > + * > + * The same logic applies with @a and @b swapped. > + * > + * Return value: true if pointers are equal, false otherwise. > + * > + * The compiler barrier() is ineffective at fixing this issue. It does > + * not prevent the compiler CSE from losing the address dependency: > + * > + * int fct_2_volatile_barriers(void) > + * { > + * int *a, *b; > + * > + * do { > + * a = READ_ONCE(p); > + * asm volatile ("" : : : "memory"); > + * b = READ_ONCE(p); > + * } while (a != b); > + * asm volatile ("" : : : "memory"); <-- barrier() > + * return *b; > + * } > + * > + * With gcc 14.2 (arm64): > + * > + * fct_2_volatile_barriers: > + * adrpx0, .LANCHOR0 > + * add x0, x0, :lo12:.LANCHOR0 > + * .L2: > + * ldr x1, [x0] <-- x1 populated by first load. > + * ldr x2, [x0] > + * cmp x1, x2 > + * bne .L2 > + * ldr w0, [x1] <-- x1 is used for access which should depend > on b. > + * re
Re: [RFC PATCH] compiler.h: Introduce ptr_eq() to preserve address dependency
On 2024-09-27 22:33, Mathieu Desnoyers wrote: [...] --- include/linux/compiler.h | 62 1 file changed, 62 insertions(+) I'm wondering if this really belongs in compiler.h, or if it's so RCU/HP specific that it should be implemented in rcupdate.h ? [... ] +static __always_inline +int ptr_eq(const volatile void *a, const volatile void *b) And perhaps rename this to rcu_ptr_eq() ? Thanks, Mathieu +{ + OPTIMIZER_HIDE_VAR(a); + OPTIMIZER_HIDE_VAR(b); + return a == b; +} + -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
[RFC PATCH] compiler.h: Introduce ptr_eq() to preserve address dependency
Compiler CSE and SSA GVN optimizations can cause the address dependency of addresses returned by rcu_dereference to be lost when comparing those pointers with either constants or previously loaded pointers. Introduce ptr_eq() to compare two addresses while preserving the address dependencies for later use of the address. It should be used when comparing an address returned by rcu_dereference(). This is needed to prevent the compiler CSE and SSA GVN optimizations from replacing the registers holding @a or @b based on their equality, which does not preserve address dependencies and allows the following misordering speculations: - If @b is a constant, the compiler can issue the loads which depend on @a before loading @a. - If @b is a register populated by a prior load, weakly-ordered CPUs can speculate loads which depend on @a before loading @a. The same logic applies with @a and @b swapped. The compiler barrier() is ineffective at fixing this issue. It does not prevent the compiler CSE from losing the address dependency: int fct_2_volatile_barriers(void) { int *a, *b; do { a = READ_ONCE(p); asm volatile ("" : : : "memory"); b = READ_ONCE(p); } while (a != b); asm volatile ("" : : : "memory"); <- barrier() return *b; } With gcc 14.2 (arm64): fct_2_volatile_barriers: adrpx0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 .L2: ldr x1, [x0]<-- x1 populated by first load. ldr x2, [x0] cmp x1, x2 bne .L2 ldr w0, [x1]<-- x1 is used for access which should depend on b. ret On weakly-ordered architectures, this lets CPU speculation use the result from the first load to speculate "ldr w0, [x1]" before "ldr x2, [x0]". Based on the RCU documentation, the control dependency does not prevent the CPU from speculating loads. Suggested-by: Linus Torvalds Suggested-by: Boqun Feng Signed-off-by: Mathieu Desnoyers Cc: Greg Kroah-Hartman Cc: Sebastian Andrzej Siewior Cc: "Paul E. McKenney" Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Alan Stern Cc: John Stultz Cc: Neeraj Upadhyay Cc: Linus Torvalds Cc: Boqun Feng Cc: Frederic Weisbecker Cc: Joel Fernandes Cc: Josh Triplett Cc: Uladzislau Rezki Cc: Steven Rostedt Cc: Lai Jiangshan Cc: Zqiang Cc: Ingo Molnar Cc: Waiman Long Cc: Mark Rutland Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: maged.mich...@gmail.com Cc: Mateusz Guzik Cc: r...@vger.kernel.org Cc: linux...@kvack.org Cc: l...@lists.linux.dev --- include/linux/compiler.h | 62 1 file changed, 62 insertions(+) diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 2df665fa2964..f26705c267e8 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, __asm__ ("" : "=r" (var) : "0" (var)) #endif +/* + * Compare two addresses while preserving the address dependencies for + * later use of the address. It should be used when comparing an address + * returned by rcu_dereference(). + * + * This is needed to prevent the compiler CSE and SSA GVN optimizations + * from replacing the registers holding @a or @b based on their + * equality, which does not preserve address dependencies and allows the + * following misordering speculations: + * + * - If @b is a constant, the compiler can issue the loads which depend + * on @a before loading @a. + * - If @b is a register populated by a prior load, weakly-ordered + * CPUs can speculate loads which depend on @a before loading @a. + * + * The same logic applies with @a and @b swapped. + * + * Return value: true if pointers are equal, false otherwise. + * + * The compiler barrier() is ineffective at fixing this issue. It does + * not prevent the compiler CSE from losing the address dependency: + * + * int fct_2_volatile_barriers(void) + * { + * int *a, *b; + * + * do { + * a = READ_ONCE(p); + * asm volatile ("" : : : "memory"); + * b = READ_ONCE(p); + * } while (a != b); + * asm volatile ("" : : : "memory"); <-- barrier() + * return *b; + * } + * + * With gcc 14.2 (arm64): + * + * fct_2_volatile_barriers: + * adrpx0, .LANCHOR0 + * add x0, x0, :lo12:.LANCHOR0 + * .L2: + * ldr x1, [x0] <-- x1 populated by first load. + * ldr x2, [x0] + * cmp x1, x2 + * bne .L2 + * ldr w0, [x1] <-- x1 is used for access which should depend on b. + * ret + * + * On weakly-ordered architectures, this lets CPU speculation use the + * result from the first load to speculate "ldr w0, [x1]" before + * "ldr x2, [x0]". + * Based on the RCU documentation, the control dependency does not + * prevent the CPU from speculating loads. + */ +static __always_inline +int ptr_eq(const volatile void *a, const volatile void *b) +{ +