Re: [patch 22/37] arm64: smp: Switch to hotplug core state synchronization

2023-04-26 Thread Mark Rutland
On Tue, Apr 25, 2023 at 09:51:12PM +0200, Thomas Gleixner wrote:
> On Mon, Apr 17 2023 at 16:50, Mark Rutland wrote:
> > As a tangent/aside, we might need to improve that for confidential compute
> > architectures, and we might want to generically track cpus which might 
> > still be
> > using kernel text/data. On arm64 we ensure that via our cpu_kill() callback
> > (which'll use PSCI CPU_AFFINITY_INFO), but I'm not sure if TDX and/or 
> > SEV-SNP
> > have a similar mechanism.
> >
> > Otherwise, a malicious hypervisor can pause a vCPU just before it leaves the
> > kernel (e.g. immediately after the arch_cpuhp_cleanup_dead_cpu() call), wait
> > for a kexec (or resuse of stack memroy), and unpause the vCPU to cause 
> > things
> > to blow up.
> 
> There are a gazillion ways for a malicious hypervisor to blow up a
> 'squint enough to be confident' guest.
> 
> The real question is whether it can utilize such a blow up to extract
> confidential information from the guest.
>
> If not then it's just yet another way of DoS which is an "acceptable"
> attack as it only affects availability but not confidentiality.

Sure.

My thinking is that this is an attack against the *integrity* of the guest
(since the vCPU that gets unpasued may write to memory), and so it's
potentially more than just a DoS.

I only mention this because I'd like to account for that on arm64, and if other
architectures also wanted to handle that it might make sense to have some
common infrastructure to track whether CPUs are potentially still within the
kernel.

Thanks,
Mark.



Re: [patch 22/37] arm64: smp: Switch to hotplug core state synchronization

2023-04-17 Thread Mark Rutland
On Sat, Apr 15, 2023 at 01:44:49AM +0200, Thomas Gleixner wrote:
> Switch to the CPU hotplug core state tracking and synchronization
> mechanim. No functional change intended.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: linux-arm-ker...@lists.infradead.org

I gave this a spin on arm64 (in a 64-vCPU VM on an M1 host), and it seems to
work fine with a bunch of vCPUs being hotplugged off and on again randomly.

FWIW:

Tested-by: Mark Rutland 

I also hacked the code to have the dying CPU spin forever before the call to
cpuhp_ap_report_dead(). In that case I see a warning, and that we don't call
arch_cpuhp_cleanup_dead_cpu(), and that the CPU is marked as offline (per
/sys/devices/system/cpu/$N/online).

As a tangent/aside, we might need to improve that for confidential compute
architectures, and we might want to generically track cpus which might still be
using kernel text/data. On arm64 we ensure that via our cpu_kill() callback
(which'll use PSCI CPU_AFFINITY_INFO), but I'm not sure if TDX and/or SEV-SNP
have a similar mechanism.

Otherwise, a malicious hypervisor can pause a vCPU just before it leaves the
kernel (e.g. immediately after the arch_cpuhp_cleanup_dead_cpu() call), wait
for a kexec (or resuse of stack memroy), and unpause the vCPU to cause things
to blow up.

Thanks,
Mark.

> ---
>  arch/arm64/Kconfig   |1 +
>  arch/arm64/include/asm/smp.h |2 +-
>  arch/arm64/kernel/smp.c  |   14 +-
>  3 files changed, 7 insertions(+), 10 deletions(-)
> 
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -216,6 +216,7 @@ config ARM64
>   select HAVE_KPROBES
>   select HAVE_KRETPROBES
>   select HAVE_GENERIC_VDSO
> + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
>   select IRQ_DOMAIN
>   select IRQ_FORCED_THREADING
>   select KASAN_VMALLOC if KASAN
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_
>  
>  extern int __cpu_disable(void);
>  
> -extern void __cpu_die(unsigned int cpu);
> +static inline void __cpu_die(unsigned int cpu) { }
>  extern void cpu_die(void);
>  extern void cpu_die_early(void);
>  
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -333,17 +333,13 @@ static int op_cpu_kill(unsigned int cpu)
>  }
>  
>  /*
> - * called on the thread which is asking for a CPU to be shutdown -
> - * waits until shutdown has completed, or it is timed out.
> + * Called on the thread which is asking for a CPU to be shutdown after the
> + * shutdown completed.
>   */
> -void __cpu_die(unsigned int cpu)
> +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
>  {
>   int err;
>  
> - if (!cpu_wait_death(cpu, 5)) {
> - pr_crit("CPU%u: cpu didn't die\n", cpu);
> - return;
> - }
>   pr_debug("CPU%u: shutdown\n", cpu);
>  
>   /*
> @@ -370,8 +366,8 @@ void cpu_die(void)
>  
>   local_daif_mask();
>  
> - /* Tell __cpu_die() that this CPU is now safe to dispose of */
> - (void)cpu_report_death();
> + /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */
> + cpuhp_ap_report_dead();
>  
>   /*
>* Actually shutdown the CPU. This must never fail. The specific hotplug
> 



Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Mark Rutland
On Tue, Jun 14, 2022 at 06:58:30PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> > > Hi All! (omg so many)
> > 
> > Hi Peter,
> > 
> > Sorry for the delay; my plate has also been rather full recently. I'm 
> > beginning
> > to page this in now.
> 
> No worries; we all have too much to do ;-)
> 
> > > These here few patches mostly clear out the utter mess that is cpuidle vs 
> > > rcuidle.
> > > 
> > > At the end of the ride there's only 2 real RCU_NONIDLE() users left
> > > 
> > >   arch/arm64/kernel/suspend.c:
> > > RCU_NONIDLE(__cpu_suspend_exit());
> > >   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> > > PERF_EF_RELOAD));
> > 
> > The latter of these is necessary because apparently PM notifiers are called
> > with RCU not watching. Is that still the case today (or at the end of this
> > series)? If so, that feels like fertile land for more issues (yaey...). If 
> > not,
> > we should be able to drop this.
> 
> That should be fixed; fingers crossed :-)

Cool; I'll try to give that a spin when I'm sat next to some relevant hardware. 
:)

> > >   kernel/cfi.c:   RCU_NONIDLE({
> > > 
> > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand 
> > > full
> > > of trace_.*_rcuidle() left:
> > > 
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_preempt_enable_rcuidle(a0, a1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_preempt_disable_rcuidle(a0, a1);
> > > 
> > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.
> > I think those are also unused on arm64 too?
> > 
> > If not, I can go attack that.
> 
> My grep spots:
> 
> arch/arm64/kernel/entry-common.c:   trace_hardirqs_on();
> arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();
> arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();

Ah; I hadn't realised those used trace_.*_rcuidle() behind the scenes.

That affects local_irq_{enable,disable,restore}() too (which is what the
daifflags.h bits are emulating), and also the generic entry code's
irqentry_exit().

So it feels to me like we should be fixing those more generally? e.g. say that
with a new STRICT_ENTRY[_RCU], we can only call trace_hardirqs_{on,off}() with
RCU watching, and alter the definition of those?

> The _on thing should be replaced with something like:
> 
>   trace_hardirqs_on_prepare();
>   lockdep_hardirqs_on_prepare();
>   instrumentation_end();
>   rcu_irq_exit();
>   lockdep_hardirqs_on(CALLER_ADDR0);
> 
> (as I think you know, since you have some of that already). And
> something similar for the _off thing, but with _off_finish().

Sure; I knew that was necessary for the outermost parts of entry (and I think
that's all handled), I just hadn't realised that trace_hardirqs_{on,off} did
the rcuidle thing in the middle.

It'd be nice to not have to open-code the whole sequence everywhere for the
portions which run after entry and are instrumentable, so (as above) I reckon
we want to make trace_hardirqs_{on,off}() not do the rcuidle part
unnecessarily (which IIUC is an end-goal anyway)?

> > > I've touched a _lot_ of code that I can't test and likely broken some of 
> > > it :/
> > > In particular, the whole ARM cpuidle stuff was quite involved with OMAP 
> > > being
> > > the absolute 'winner'.
> > > 
> > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves 
> > > that to
> > > GENERIC_ENTRY.
> > 
> > Moving to GENERIC_ENTRY as a whole is going to take a tonne of work
> > (refactoring both arm64 and the generic portion to be more amenable to each
> > other), but we can certainly move closer to that for the bits that matter 
> > here.
> 
> I know ... been ther

Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Mark Rutland
On Tue, Jun 14, 2022 at 06:42:14PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 05:13:16PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> > > All callers should still have RCU enabled.
> > 
> > IIUC with that true we should be able to drop the RCU_NONIDLE() from
> > drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
> > notifier.
> > 
> > I should be able to give that a spin on some hardware.
> > 
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > ---
> > >  kernel/cpu_pm.c |9 -
> > >  1 file changed, 9 deletions(-)
> > > 
> > > --- a/kernel/cpu_pm.c
> > > +++ b/kernel/cpu_pm.c
> > > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
> > >  {
> > >   int ret;
> > >  
> > > - /*
> > > -  * This introduces a RCU read critical section, which could be
> > > -  * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> > > -  * this.
> > > -  */
> > > - rcu_irq_enter_irqson();
> > >   rcu_read_lock();
> > >   ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
> > >   rcu_read_unlock();
> > > - rcu_irq_exit_irqson();
> > 
> > To make this easier to debug, is it worth adding an assertion that RCU is
> > watching here? e.g.
> > 
> > RCU_LOCKDEP_WARN(!rcu_is_watching(),
> >  "cpu_pm_notify() used illegally from EQS");
> > 
> 
> My understanding is that rcu_read_lock() implies something along those
> lines when PROVE_RCU.

Ah, duh. Given that:

Acked-by: Mark Rutland 

Mark.



Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Mark Rutland
On Tue, Jun 14, 2022 at 06:40:53PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 01:41:13PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> > > --- a/kernel/time/tick-broadcast.c
> > > +++ b/kernel/time/tick-broadcast.c
> > > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
> > >   * to avoid a deep idle transition as we are about to get the
> > >   * broadcast IPI right away.
> > >   */
> > > -int tick_check_broadcast_expired(void)
> > > +noinstr int tick_check_broadcast_expired(void)
> > >  {
> > > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> > > + return arch_test_bit(smp_processor_id(), 
> > > cpumask_bits(tick_broadcast_force_mask));
> > > +#else
> > >   return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> > > +#endif
> > >  }
> > 
> > This is somewhat not-ideal. :/
> 
> I'll say.
> 
> > Could we unconditionally do the arch_test_bit() variant, with a comment, or
> > does that not exist in some cases?
> 
> Loads of build errors ensued, which is how I ended up with this mess ...

Yaey :(

I see the same is true for the thread flag manipulation too.

I'll take a look and see if we can layer things so that we can use the arch_*()
helpers and wrap those consistently so that we don't have to check the CPP
guard.

Ideally we'd have a a better language that allows us to make some
context-senstive decisions, then we could hide all this gunk in the lower
levels with somethin like:

if (!THIS_IS_A_NOINSTR_FUNCTION()) {
explicit_instrumentation(...);
}

... ho hum.

Mark.



Re: [PATCH 16/36] rcu: Fix rcu_idle_exit()

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:39PM +0200, Peter Zijlstra wrote:
> Current rcu_idle_exit() is terminally broken because it uses
> local_irq_{save,restore}(), which are traced which uses RCU.
> 
> However, now that all the callers are sure to have IRQs disabled, we
> can remove these calls.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Acked-by: Paul E. McKenney 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/rcu/tree.c |9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -659,7 +659,7 @@ static noinstr void rcu_eqs_enter(bool u
>   * If you add or remove a call to rcu_idle_enter(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_enter(void)
> +void noinstr rcu_idle_enter(void)
>  {
>   lockdep_assert_irqs_disabled();
>   rcu_eqs_enter(false);
> @@ -896,13 +896,10 @@ static void noinstr rcu_eqs_exit(bool us
>   * If you add or remove a call to rcu_idle_exit(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
>  {
> - unsigned long flags;
> -
> - local_irq_save(flags);
> + lockdep_assert_irqs_disabled();
>   rcu_eqs_exit(false);
> - local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_exit);
>  
> 
> 



Re: [PATCH 20/36] arch/idle: Change arch_cpu_idle() IRQ behaviour

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:43PM +0200, Peter Zijlstra wrote:
> Current arch_cpu_idle() is called with IRQs disabled, but will return
> with IRQs enabled.
> 
> However, the very first thing the generic code does after calling
> arch_cpu_idle() is raw_local_irq_disable(). This means that
> architectures that can idle with IRQs disabled end up doing a
> pointless 'enable-disable' dance.
> 
> Therefore, push this IRQ disabling into the idle function, meaning
> that those architectures can avoid the pointless IRQ state flipping.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Nice!

  Acked-by: Mark Rutland  [arm64]

Mark.

> ---
>  arch/alpha/kernel/process.c  |1 -
>  arch/arc/kernel/process.c|3 +++
>  arch/arm/kernel/process.c|1 -
>  arch/arm/mach-gemini/board-dt.c  |3 ++-
>  arch/arm64/kernel/idle.c |1 -
>  arch/csky/kernel/process.c   |1 -
>  arch/csky/kernel/smp.c   |2 +-
>  arch/hexagon/kernel/process.c|1 -
>  arch/ia64/kernel/process.c   |1 +
>  arch/microblaze/kernel/process.c |1 -
>  arch/mips/kernel/idle.c  |8 +++-
>  arch/nios2/kernel/process.c  |1 -
>  arch/openrisc/kernel/process.c   |1 +
>  arch/parisc/kernel/process.c |2 --
>  arch/powerpc/kernel/idle.c   |5 ++---
>  arch/riscv/kernel/process.c  |1 -
>  arch/s390/kernel/idle.c  |1 -
>  arch/sh/kernel/idle.c|1 +
>  arch/sparc/kernel/leon_pmc.c |4 
>  arch/sparc/kernel/process_32.c   |1 -
>  arch/sparc/kernel/process_64.c   |3 ++-
>  arch/um/kernel/process.c |1 -
>  arch/x86/coco/tdx/tdx.c  |3 +++
>  arch/x86/kernel/process.c|   15 ---
>  arch/xtensa/kernel/process.c |1 +
>  kernel/sched/idle.c  |2 --
>  26 files changed, 28 insertions(+), 37 deletions(-)
> 
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
>  void arch_cpu_idle(void)
>  {
>   wtint(0);
> - raw_local_irq_enable();
>  }
>  
>  void arch_cpu_idle_dead(void)
> --- a/arch/arc/kernel/process.c
> +++ b/arch/arc/kernel/process.c
> @@ -114,6 +114,8 @@ void arch_cpu_idle(void)
>   "sleep %0   \n"
>   :
>   :"I"(arg)); /* can't be "r" has to be embedded const */
> +
> + raw_local_irq_disable();
>  }
>  
>  #else/* ARC700 */
> @@ -122,6 +124,7 @@ void arch_cpu_idle(void)
>  {
>   /* sleep, but enable both set E1/E2 (levels of interrupts) before 
> committing */
>   __asm__ __volatile__("sleep 0x3 \n");
> + raw_local_irq_disable();
>  }
>  
>  #endif
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -78,7 +78,6 @@ void arch_cpu_idle(void)
>   arm_pm_idle();
>   else
>   cpu_do_idle();
> - raw_local_irq_enable();
>  }
>  
>  void arch_cpu_idle_prepare(void)
> --- a/arch/arm/mach-gemini/board-dt.c
> +++ b/arch/arm/mach-gemini/board-dt.c
> @@ -42,8 +42,9 @@ static void gemini_idle(void)
>*/
>  
>   /* FIXME: Enabling interrupts here is racy! */
> - local_irq_enable();
> + raw_local_irq_enable();
>   cpu_do_idle();
> + raw_local_irq_disable();
>  }
>  
>  static void __init gemini_init_machine(void)
> --- a/arch/arm64/kernel/idle.c
> +++ b/arch/arm64/kernel/idle.c
> @@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
>* tricks
>*/
>   cpu_do_idle();
> - raw_local_irq_enable();
>  }
> --- a/arch/csky/kernel/process.c
> +++ b/arch/csky/kernel/process.c
> @@ -101,6 +101,5 @@ void arch_cpu_idle(void)
>  #ifdef CONFIG_CPU_PM_STOP
>   asm volatile("stop\n");
>  #endif
> - raw_local_irq_enable();
>  }
>  #endif
> --- a/arch/csky/kernel/smp.c
> +++ b/arch/csky/kernel/smp.c
> @@ -314,7 +314,7 @@ void arch_cpu_idle_dead(void)
>   while (!secondary_stack)
>   arch_cpu_idle();
>  
> - local_irq_disable();
> + raw_local_irq_disable();
>  
>   asm volatile(
>   "movsp, %0\n"
> --- a/arch/hexagon/kernel/process.c
> +++ b/arch/hexagon/kernel/process.c
> @@ -44,7 +44,6 @@ void arch_cpu_idle(void)
>  {
>   __vmwait();
>   /*  interrupts wake us up, but irqs are still disabled */
> - raw_local_irq_enable();
>  }
>  
>  /*
> --- a/arch/ia64/kernel/process.c
> +++ b/arch/ia64/kernel/process.c
> @@ -241,6 +241,7 @@ void arch_cpu_idle(void)
>   (*mark_idle)(1);
>  
>

Re: [PATCH 25/36] time/tick-broadcast: Remove RCU_NONIDLE usage

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:48PM +0200, Peter Zijlstra wrote:
> No callers left that have already disabled RCU.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/time/tick-broadcast-hrtimer.c |   29 -
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> --- a/kernel/time/tick-broadcast-hrtimer.c
> +++ b/kernel/time/tick-broadcast-hrtimer.c
> @@ -56,25 +56,20 @@ static int bc_set_next(ktime_t expires,
>* hrtimer callback function is currently running, then
>* hrtimer_start() cannot move it and the timer stays on the CPU on
>* which it is assigned at the moment.
> +  */
> + hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
> + /*
> +  * The core tick broadcast mode expects bc->bound_on to be set
> +  * correctly to prevent a CPU which has the broadcast hrtimer
> +  * armed from going deep idle.
>*
> -  * As this can be called from idle code, the hrtimer_start()
> -  * invocation has to be wrapped with RCU_NONIDLE() as
> -  * hrtimer_start() can call into tracing.
> +  * As tick_broadcast_lock is held, nothing can change the cpu
> +  * base which was just established in hrtimer_start() above. So
> +  * the below access is safe even without holding the hrtimer
> +  * base lock.
>*/
> - RCU_NONIDLE( {
> - hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
> - /*
> -  * The core tick broadcast mode expects bc->bound_on to be set
> -  * correctly to prevent a CPU which has the broadcast hrtimer
> -  * armed from going deep idle.
> -  *
> -  * As tick_broadcast_lock is held, nothing can change the cpu
> -  * base which was just established in hrtimer_start() above. So
> -  * the below access is safe even without holding the hrtimer
> -  * base lock.
> -  */
> - bc->bound_on = bctimer.base->cpu_base->cpu;
> - } );
> + bc->bound_on = bctimer.base->cpu_base->cpu;
> +
>   return 0;
>  }
>  
> 
> 



Re: [PATCH 23/36] arm64,smp: Remove trace_.*_rcuidle() usage

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:46PM +0200, Peter Zijlstra wrote:
> Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as
> normal interrupts") this function is called in regular IRQ context.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

[adding Marc since he authored that commit]

Makes sense to me:

  Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kernel/smp.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr)
>   unsigned int cpu = smp_processor_id();
>  
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_entry_rcuidle(ipi_types[ipinr]);
> + trace_ipi_entry(ipi_types[ipinr]);
>  
>   switch (ipinr) {
>   case IPI_RESCHEDULE:
> @@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr)
>   }
>  
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_exit_rcuidle(ipi_types[ipinr]);
> + trace_ipi_exit(ipi_types[ipinr]);
>  }
>  
>  static irqreturn_t ipi_handler(int irq, void *data)
> 
> 



Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> All callers should still have RCU enabled.

IIUC with that true we should be able to drop the RCU_NONIDLE() from
drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
notifier.

I should be able to give that a spin on some hardware.

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/cpu_pm.c |9 -
>  1 file changed, 9 deletions(-)
> 
> --- a/kernel/cpu_pm.c
> +++ b/kernel/cpu_pm.c
> @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
>  {
>   int ret;
>  
> - /*
> -  * This introduces a RCU read critical section, which could be
> -  * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> -  * this.
> -  */
> - rcu_irq_enter_irqson();
>   rcu_read_lock();
>   ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
>   rcu_read_unlock();
> - rcu_irq_exit_irqson();

To make this easier to debug, is it worth adding an assertion that RCU is
watching here? e.g.

RCU_LOCKDEP_WARN(!rcu_is_watching(),
 "cpu_pm_notify() used illegally from EQS");

>  
>   return notifier_to_errno(ret);
>  }
> @@ -49,11 +42,9 @@ static int cpu_pm_notify_robust(enum cpu
>   unsigned long flags;
>   int ret;
>  
> - rcu_irq_enter_irqson();
>   raw_spin_lock_irqsave(&cpu_pm_notifier.lock, flags);
>   ret = raw_notifier_call_chain_robust(&cpu_pm_notifier.chain, event_up, 
> event_down, NULL);
>   raw_spin_unlock_irqrestore(&cpu_pm_notifier.lock, flags);
> - rcu_irq_exit_irqson();


... and likewise here?

Thanks,
Mark.

>  
>   return notifier_to_errno(ret);
>  }
> 
> 



Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
>   * to avoid a deep idle transition as we are about to get the
>   * broadcast IPI right away.
>   */
> -int tick_check_broadcast_expired(void)
> +noinstr int tick_check_broadcast_expired(void)
>  {
> +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> + return arch_test_bit(smp_processor_id(), 
> cpumask_bits(tick_broadcast_force_mask));
> +#else
>   return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> +#endif
>  }

This is somewhat not-ideal. :/

Could we unconditionally do the arch_test_bit() variant, with a comment, or
does that not exist in some cases?

Thanks,
Mark.



Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> Hi All! (omg so many)

Hi Peter,

Sorry for the delay; my plate has also been rather full recently. I'm beginning
to page this in now.

> These here few patches mostly clear out the utter mess that is cpuidle vs 
> rcuidle.
> 
> At the end of the ride there's only 2 real RCU_NONIDLE() users left
> 
>   arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());
>   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> PERF_EF_RELOAD));

The latter of these is necessary because apparently PM notifiers are called
with RCU not watching. Is that still the case today (or at the end of this
series)? If so, that feels like fertile land for more issues (yaey...). If not,
we should be able to drop this.

I can go dig into that some more.

>   kernel/cfi.c:   RCU_NONIDLE({
> 
> (the CFI one is likely dead in the kCFI rewrite) and there's only a hand full
> of trace_.*_rcuidle() left:
> 
>   kernel/trace/trace_preemptirq.c:
> trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
>   kernel/trace/trace_preemptirq.c:
> trace_preempt_enable_rcuidle(a0, a1);
>   kernel/trace/trace_preemptirq.c:
> trace_preempt_disable_rcuidle(a0, a1);
> 
> All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.

I think those are also unused on arm64 too?

If not, I can go attack that.

> I've touched a _lot_ of code that I can't test and likely broken some of it :/
> In particular, the whole ARM cpuidle stuff was quite involved with OMAP being
> the absolute 'winner'.
> 
> I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to
> GENERIC_ENTRY.

Moving to GENERIC_ENTRY as a whole is going to take a tonne of work
(refactoring both arm64 and the generic portion to be more amenable to each
other), but we can certainly move closer to that for the bits that matter here.

Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff that
we can select regardless of GENERIC_ENTRY to make that easier.

> I've also got a note that says ARM64 can probably do a WFE based
> idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs.

Possibly; I'm not sure how much of a win that'll be given that by default we'll
have a ~10KHz WFE wakeup from the timer, but we could take a peek.

Thanks,
Mark.



Re: [PATCH V7 01/18] perf/core: Use static_call to optimize perf_guest_info_callbacks

2021-07-02 Thread Mark Rutland
On Fri, Jul 02, 2021 at 09:00:22AM -0700, Joe Perches wrote:
> On Fri, 2021-07-02 at 13:22 +0200, Peter Zijlstra wrote:
> > On Tue, Jun 22, 2021 at 05:42:49PM +0800, Zhu Lingshan wrote:
> > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> []
> > > @@ -90,6 +90,27 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, 
> > > *x86_pmu.pebs_aliases);
> > >   */
> > >  DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
> > >  
> > > 
> > > +DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
> > > +DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
> > > +DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, 
> > > *(perf_guest_cbs->handle_intel_pt_intr));
> > > +
> > > +void arch_perf_update_guest_cbs(void)
> > > +{
> > > + static_call_update(x86_guest_state, (void *)&__static_call_return0);
> > > + static_call_update(x86_guest_get_ip, (void *)&__static_call_return0);
> > > + static_call_update(x86_guest_handle_intel_pt_intr, (void 
> > > *)&__static_call_return0);
> > > +
> > > + if (perf_guest_cbs && perf_guest_cbs->state)
> > > + static_call_update(x86_guest_state, perf_guest_cbs->state);
> > > +
> > > + if (perf_guest_cbs && perf_guest_cbs->get_ip)
> > > + static_call_update(x86_guest_get_ip, perf_guest_cbs->get_ip);
> > > +
> > > + if (perf_guest_cbs && perf_guest_cbs->handle_intel_pt_intr)
> > > + static_call_update(x86_guest_handle_intel_pt_intr,
> > > +perf_guest_cbs->handle_intel_pt_intr);
> > > +}
> > 
> > Coding style wants { } on that last if().
> 
> That's just your personal preference.
> 
> The coding-style document doesn't require that.
> 
> It just says single statement.  It's not the number of
> vertical lines or characters required for the statement.
> 
> --
> 
> Do not unnecessarily use braces where a single statement will do.
> 
> .. code-block:: c
> 
>   if (condition)
>   action();
> 
> and
> 
> .. code-block:: none
> 
>   if (condition)
>   do_this();
>   else
>   do_that();
> 
> This does not apply if only one branch of a conditional statement is a single
> statement; in the latter case use braces in both branches:

Immediately after this, we say:

| Also, use braces when a loop contains more than a single simple statement:
|
| .. code-block:: c
| 
| while (condition) {
| if (test)
| do_something();
| }
| 

... and while that says "a loop", the principle is obviously supposed to
apply to conditionals too; structurally they're no different. We should
just fix the documentation to say "a loop or conditional", or something
to that effect.

Mark.



Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-10 Thread Mark Rutland
On Wed, Dec 09, 2020 at 07:54:26PM +0100, Thomas Gleixner wrote:
> On Wed, Dec 09 2020 at 18:15, Mark Rutland wrote:
> > In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:
> >
> > local_irq_save(flags);
> > local_irq_enable();
> >
> > [ trigger an IRQ here ]
> >
> > local_irq_restore(flags);
> >
> > ... and in check_timer() we call that a number of times after either a
> > local_irq_save() or local_irq_disable(), eventually trailing with a
> > local_irq_disable() that will balance things up before calling
> > local_irq_restore().

I gave the patchlet below a spin with my debug patch, and it boots
cleanly for me under QEMU. If you spin it as a real patch, feel free to
add:

Tested-by: Mark Rutland 

Mark.

> ---
>  arch/x86/kernel/apic/io_apic.c |   22 ++
>  1 file changed, 6 insertions(+), 16 deletions(-)
> 
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
>  static int __init timer_irq_works(void)
>  {
>   unsigned long t1 = jiffies;
> - unsigned long flags;
>  
>   if (no_timer_check)
>   return 1;
>  
> - local_save_flags(flags);
>   local_irq_enable();
> -
>   if (boot_cpu_has(X86_FEATURE_TSC))
>   delay_with_tsc();
>   else
>   delay_without_tsc();
>  
> - local_irq_restore(flags);
> -
>   /*
>* Expect a few ticks at least, to be sure some possible
>* glue logic does not lock up after one or two first
> @@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
>* least one tick may be lost due to delays.
>*/
>  
> - /* jiffies wrap? */
> - if (time_after(jiffies, t1 + 4))
> - return 1;
> - return 0;
> + local_irq_disable();
> +
> + /* Did jiffies advance? */
> + return time_after(jiffies, t1 + 4);
>  }
>  
>  /*
> @@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
>   struct irq_cfg *cfg = irqd_cfg(irq_data);
>   int node = cpu_to_node(0);
>   int apic1, pin1, apic2, pin2;
> - unsigned long flags;
>   int no_pin1 = 0;
>  
>   if (!global_clock_event)
>   return;
>  
> - local_irq_save(flags);
> + local_irq_disable();
>  
>   /*
>* get/set the timer IRQ vector:
> @@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
>   goto out;
>   }
>   panic_if_irq_remap("timer doesn't work through 
> Interrupt-remapped IO-APIC");
> - local_irq_disable();
>   clear_IO_APIC_pin(apic1, pin1);
>   if (!no_pin1)
>   apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
> @@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
>   /*
>* Cleanup, just in case ...
>*/
> - local_irq_disable();
>   legacy_pic->mask(0);
>   clear_IO_APIC_pin(apic2, pin2);
>   apic_printk(APIC_QUIET, KERN_INFO "... failed.\n");
> @@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
>   apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
>   goto out;
>   }
> - local_irq_disable();
>   legacy_pic->mask(0);
>   apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
>   apic_printk(APIC_QUIET, KERN_INFO ". failed.\n");
> @@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
>   apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
>   goto out;
>   }
> - local_irq_disable();
>   apic_printk(APIC_QUIET, KERN_INFO ". failed :(.\n");
>   if (apic_is_x2apic_enabled())
>   apic_printk(APIC_QUIET, KERN_INFO
> @@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
>   panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
>   "report.  Then try booting with the 'noapic' option.\n");
>  out:
> - local_irq_restore(flags);
> + local_irq_enable();
>  }
>  
>  /*



Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Mark Rutland
On Fri, Nov 20, 2020 at 12:59:43PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> > +static __always_inline void arch_local_irq_restore(unsigned long flags)
> > +{
> > +   if (!arch_irqs_disabled_flags(flags))
> > +   arch_local_irq_enable();
> > +}
> 
> If someone were to write horrible code like:
> 
>   local_irq_disable();
>   local_irq_save(flags);
>   local_irq_enable();
>   local_irq_restore(flags);
> 
> we'd be up some creek without a paddle... now I don't _think_ we have
> genius code like that, but I'd feel saver if we can haz an assertion in
> there somewhere...

I've cobbled that together locally (i'll post it momentarily), and gave it a
spin on both arm64 and x86, whereupon it exploded at boot time on x86.

In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:

local_irq_save(flags);
local_irq_enable();

[ trigger an IRQ here ]

local_irq_restore(flags);

... and in check_timer() we call that a number of times after either a
local_irq_save() or local_irq_disable(), eventually trailing with a
local_irq_disable() that will balance things up before calling
local_irq_restore().

I guess that timer_irq_works() should instead do:

local_irq_save(flags);
local_irq_enable();
...
local_irq_disable();
local_irq_restore(flags);

... assuming we consider that legitimate?

With that, and all the calls to local_irq_disable() in check_timer() removed
(diff below) I get a clean boot under QEMU with the assertion hacked in and
DEBUG_LOCKDEP enabled.

Thanks
Mark.

>8
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7b3c7e0d4a09..e79e665a3aeb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1631,6 +1631,7 @@ static int __init timer_irq_works(void)
else
delay_without_tsc();
 
+   local_irq_disable();
local_irq_restore(flags);
 
/*
@@ -2191,7 +2192,6 @@ static inline void __init check_timer(void)
goto out;
}
panic_if_irq_remap("timer doesn't work through 
Interrupt-remapped IO-APIC");
-   local_irq_disable();
clear_IO_APIC_pin(apic1, pin1);
if (!no_pin1)
apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2215,6 @@ static inline void __init check_timer(void)
/*
 * Cleanup, just in case ...
 */
-   local_irq_disable();
legacy_pic->mask(0);
clear_IO_APIC_pin(apic2, pin2);
apic_printk(APIC_QUIET, KERN_INFO "... failed.\n");
@@ -2232,7 +2231,6 @@ static inline void __init check_timer(void)
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
legacy_pic->mask(0);
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
apic_printk(APIC_QUIET, KERN_INFO ". failed.\n");
@@ -2251,7 +2249,6 @@ static inline void __init check_timer(void)
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
apic_printk(APIC_QUIET, KERN_INFO ". failed :(.\n");
if (apic_is_x2apic_enabled())
apic_printk(APIC_QUIET, KERN_INFO



Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Mark Rutland
On Wed, Dec 09, 2020 at 01:27:10PM +, Mark Rutland wrote:
> On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:
> > On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß  wrote:
> > > On 20.11.20 12:59, Peter Zijlstra wrote:
> > > > If someone were to write horrible code like:
> > > >
> > > >   local_irq_disable();
> > > >   local_irq_save(flags);
> > > >   local_irq_enable();
> > > >   local_irq_restore(flags);
> > > >
> > > > we'd be up some creek without a paddle... now I don't _think_ we have
> > > > genius code like that, but I'd feel saver if we can haz an assertion in
> > > > there somewhere...

> I was just talking to Peter on IRC about implementing the same thing for
> arm64, so could we put this in the generic irqflags code? IIUC we can
> use raw_irqs_disabled() to do the check.
> 
> As this isn't really entry specific (and IIUC the cases this should
> catch would break lockdep today), maybe we should add a new
> DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?
> 
> Something like:
> 
> #define local_irq_restore(flags)   \
>do {\
>if (!raw_irqs_disabled_flags(flags)) {  \
>trace_hardirqs_on();\
>} else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
>if (unlikely(raw_irqs_disabled())   \

Whoops; that should be !raw_irqs_disabled().

>warn_bogus_irqrestore();\
>}   \
>raw_local_irq_restore(flags);   \
> } while (0)
> 
> ... perhaps? (ignoring however we deal with once-ness).

If no-one shouts in the next day or two I'll spin this as its own patch.

Mark.



Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Mark Rutland
On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß  wrote:
> >
> > On 20.11.20 12:59, Peter Zijlstra wrote:
> > > On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> > >> +static __always_inline void arch_local_irq_restore(unsigned long flags)
> > >> +{
> > >> +if (!arch_irqs_disabled_flags(flags))
> > >> +arch_local_irq_enable();
> > >> +}
> > >
> > > If someone were to write horrible code like:
> > >
> > >   local_irq_disable();
> > >   local_irq_save(flags);
> > >   local_irq_enable();
> > >   local_irq_restore(flags);
> > >
> > > we'd be up some creek without a paddle... now I don't _think_ we have
> > > genius code like that, but I'd feel saver if we can haz an assertion in
> > > there somewhere...
> > >
> > > Maybe something like:
> > >
> > > #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
> > >   WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
> > > #endif
> > >
> > > At the end?
> >
> > I'd like to, but using WARN_ON_ONCE() in include/asm/irqflags.h sounds
> > like a perfect receipt for include dependency hell.
> >
> > We could use a plain asm("ud2") instead.
> 
> How about out-of-lining it:
> 
> #ifdef CONFIG_DEBUG_ENTRY
> extern void warn_bogus_irqrestore();
> #endif
> 
> static __always_inline void arch_local_irq_restore(unsigned long flags)
> {
>if (!arch_irqs_disabled_flags(flags)) {
>arch_local_irq_enable();
>} else {
> #ifdef CONFIG_DEBUG_ENTRY
>if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
> warn_bogus_irqrestore();
> #endif
> }

I was just talking to Peter on IRC about implementing the same thing for
arm64, so could we put this in the generic irqflags code? IIUC we can
use raw_irqs_disabled() to do the check.

As this isn't really entry specific (and IIUC the cases this should
catch would break lockdep today), maybe we should add a new
DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?

Something like:

#define local_irq_restore(flags)   \
   do {\
   if (!raw_irqs_disabled_flags(flags)) {  \
   trace_hardirqs_on();\
   } else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
   if (unlikely(raw_irqs_disabled())   \
   warn_bogus_irqrestore();\
   }   \
   raw_local_irq_restore(flags);   \
} while (0)

... perhaps? (ignoring however we deal with once-ness).

Thanks,
Mark.



Re: [Xen-devel] [PATCH 2/3] arm64: remove uaccess_ttbr0 asm macros from cache functions

2019-11-28 Thread Mark Rutland
On Wed, Nov 27, 2019 at 01:44:52PM -0500, Pavel Tatashin wrote:
> We currently duplicate the logic to enable/disable uaccess via TTBR0,
> with C functions and assembly macros. This is a maintenenace burden
> and is liable to lead to subtle bugs, so let's get rid of the assembly
> macros, and always use the C functions. This requires refactoring
> some assembly functions to have a C wrapper.

[...]

> +static inline int invalidate_icache_range(unsigned long start,
> +   unsigned long end)
> +{
> + int rv;

Please make this 'ret', for consistency with other arm64 code. We only
use 'rv' in one place where it's short for "Revision and Variant", and
'ret' is our usual name for a temporary variable used to hold a return
value.

> +
> + if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) {
> + isb();
> + return 0;
> + }
> + uaccess_ttbr0_enable();

Please place a newline between these two, for consistency with other
arm64 code.

> + rv = asm_invalidate_icache_range(start, end);
> + uaccess_ttbr0_disable();
> +
> + return rv;
> +}
> +
>  static inline void flush_icache_range(unsigned long start, unsigned long end)
>  {
>   __flush_icache_range(start, end);
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index db767b072601..a48b6dba304e 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -15,7 +15,7 @@
>  #include 
>  
>  /*
> - *   flush_icache_range(start,end)
> + *   __asm_flush_icache_range(start,end)
>   *
>   *   Ensure that the I and D caches are coherent within specified region.
>   *   This is typically used when code has been written to a memory region,
> @@ -24,11 +24,11 @@
>   *   - start   - virtual start address of region
>   *   - end - virtual end address of region
>   */
> -ENTRY(__flush_icache_range)
> +ENTRY(__asm_flush_icache_range)
>   /* FALLTHROUGH */
>  
>  /*
> - *   __flush_cache_user_range(start,end)
> + *   __asm_flush_cache_user_range(start,end)
>   *
>   *   Ensure that the I and D caches are coherent within specified region.
>   *   This is typically used when code has been written to a memory region,
> @@ -37,8 +37,7 @@ ENTRY(__flush_icache_range)
>   *   - start   - virtual start address of region
>   *   - end - virtual end address of region
>   */
> -ENTRY(__flush_cache_user_range)
> - uaccess_ttbr0_enable x2, x3, x4
> +ENTRY(__asm_flush_cache_user_range)
>  alternative_if ARM64_HAS_CACHE_IDC

It's unfortunate that we pulled the IDC alternative out for
invalidate_icache_range(), but not here.

If we want to pull out that, then I reckon what we might want to do is
have two asm primitives:

* __asm_clean_dcache_range
* __asm_invalidate_icache_range

... which just do the by_line op, with a fixup that can return -EFAULT.

Then we can give each a C wrapper that just does the IDC/DIC check, e.g.

static int __clean_dcache_range(unsigned long start, unsigned long end)
{
if (cpus_have_const_cap(ARM64_HAS_CACHE_IDC)) {
dsb(ishst);
return 0;
}

return __asm_clean_dcache_range(start, end);
}

Then we can build all the more complicated variants atop of those, e.g.

static int __flush_cache_user_range(unsigned long start, unsigned long end)
{
int ret;

uaccess_ttbr0_enable();

ret = __clean_dcache_range(start, end);
if (ret)
goto out;

ret = __invalidate_icache_range(start, end);

out:
uaccess_ttbr0_disable();
return ret;
}

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] arm64: remove the rest of asm-uaccess.h

2019-11-27 Thread Mark Rutland
On Wed, Nov 27, 2019 at 11:09:35AM -0500, Pavel Tatashin wrote:
> On Wed, Nov 27, 2019 at 11:03 AM Mark Rutland  wrote:
> >
> > On Wed, Nov 27, 2019 at 10:31:54AM -0500, Pavel Tatashin wrote:
> > > On Wed, Nov 27, 2019 at 10:12 AM Mark Rutland  
> > > wrote:
> > > >
> > > > On Thu, Nov 21, 2019 at 09:24:06PM -0500, Pavel Tatashin wrote:
> > > > > The __uaccess_ttbr0_disable and __uaccess_ttbr0_enable,
> > > > > are the last two macros defined in asm-uaccess.h.
> > > > >
> > > > > Replace them with C wrappers and call C functions from
> > > > > kernel_entry and kernel_exit.
> > > >
> > > > For now, please leave those as-is.
> > > >
> > > > I don't think we want to have out-of-line C wrappers in the middle of
> > > > the entry assembly where we don't have a complete kernel environment.
> > > > The use in entry code can also assume non-preemptibility, while the C
> > > > functions have to explcitily disable that.
> > >
> > > I do not understand, if C function is called form non-preemptible
> > > context it stays non-preemptible. kernel_exit already may call C
> > > functions around the time __uaccess_ttbr0_enable is called (it may
> > > call post_ttbr_update_workaround), and that C functions does not do
> > > explicit preempt disable:
> >
> > Sorry, I meant that IRQs are disabled here.
> >
> > The C wrapper calls __uaccess_ttbr0_enable(), which calls
> > local_irq_save() and local_irq_restore(). Those are pointless in the
> > bowels of the entry code, and potentially expensive if IRQ prio masking
> > is in use.
> >
> > I'd rather not add more out-of-line C code calls here right now as I'd
> > prefer to factor out the logic to C in a better way.
> 
> Ah, yes, this makes sense. I could certainly factor out C calls in a
> better way, or is this something you want to work on?

I'm hoping to do that as part of ongoing entry-deasm work, now that a
lot of the prerequisite work was merged in v5.4.

> Without removing these assembly macros I do not think we want to
> address this suggestion from Kees Cook:
> https://lore.kernel.org/lkml/ca+ck2bcbs2fkotmtfm13iv3u5tbpwpocsyeep352dve-gs9...@mail.gmail.com/

In the mean time, we could add checks around addr_limit_user_check(),
and in the context-switch path. I have some preparatory cleanup to allow
for the context-switch check, which I'll send out at -rc1. That was what
I used to detect the case you reported previously.

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] arm64: remove the rest of asm-uaccess.h

2019-11-27 Thread Mark Rutland
On Wed, Nov 27, 2019 at 10:31:54AM -0500, Pavel Tatashin wrote:
> On Wed, Nov 27, 2019 at 10:12 AM Mark Rutland  wrote:
> >
> > On Thu, Nov 21, 2019 at 09:24:06PM -0500, Pavel Tatashin wrote:
> > > The __uaccess_ttbr0_disable and __uaccess_ttbr0_enable,
> > > are the last two macros defined in asm-uaccess.h.
> > >
> > > Replace them with C wrappers and call C functions from
> > > kernel_entry and kernel_exit.
> >
> > For now, please leave those as-is.
> >
> > I don't think we want to have out-of-line C wrappers in the middle of
> > the entry assembly where we don't have a complete kernel environment.
> > The use in entry code can also assume non-preemptibility, while the C
> > functions have to explcitily disable that.
> 
> I do not understand, if C function is called form non-preemptible
> context it stays non-preemptible. kernel_exit already may call C
> functions around the time __uaccess_ttbr0_enable is called (it may
> call post_ttbr_update_workaround), and that C functions does not do
> explicit preempt disable:

Sorry, I meant that IRQs are disabled here.

The C wrapper calls __uaccess_ttbr0_enable(), which calls
local_irq_save() and local_irq_restore(). Those are pointless in the
bowels of the entry code, and potentially expensive if IRQ prio masking
is in use.

I'd rather not add more out-of-line C code calls here right now as I'd
prefer to factor out the logic to C in a better way.

> > We can certainly remove the includes of  elsewhere,
> > and maybe fold the macros into entry.S if it's not too crowded.
> 
> I can do this as a separate patch.

That sounds fine to me,

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] arm64: remove uaccess_ttbr0 asm macros from cache functions

2019-11-27 Thread Mark Rutland
On Wed, Nov 27, 2019 at 10:10:07AM -0500, Pavel Tatashin wrote:
> Hi Mark,
> 
> Thank you for reviewing this work.
 
> > The 'arch_' prefix should probably be 'asm_' (or have an '_asm' suffix),
> > since this is entirely local to the arch code, and even then should only
> > be called from the C wrappers.
> 
> Sure, I can change it to asm_*, I was using arch_* to be consistent
> with __arch_copy_from_user() and friends.

FWIW, that naming was from before the common uaccess code took on the
raw_* anming for the arch functions, and I was expecting that the arch_*
functions would end up being called from core code.

For now it's probably too churny to change that existing case.

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] arm64: remove the rest of asm-uaccess.h

2019-11-27 Thread Mark Rutland
On Thu, Nov 21, 2019 at 09:24:06PM -0500, Pavel Tatashin wrote:
> The __uaccess_ttbr0_disable and __uaccess_ttbr0_enable,
> are the last two macros defined in asm-uaccess.h.
> 
> Replace them with C wrappers and call C functions from
> kernel_entry and kernel_exit.

For now, please leave those as-is.

I don't think we want to have out-of-line C wrappers in the middle of
the entry assembly where we don't have a complete kernel environment.
The use in entry code can also assume non-preemptibility, while the C
functions have to explcitily disable that.

We can certainly remove the includes of  elsewhere,
and maybe fold the macros into entry.S if it's not too crowded.

Thanks,
Mark.

> 
> Signed-off-by: Pavel Tatashin 
> ---
>  arch/arm64/include/asm/asm-uaccess.h | 38 
>  arch/arm64/kernel/entry.S|  6 ++---
>  arch/arm64/lib/clear_user.S  |  2 +-
>  arch/arm64/lib/copy_from_user.S  |  2 +-
>  arch/arm64/lib/copy_in_user.S|  2 +-
>  arch/arm64/lib/copy_to_user.S|  2 +-
>  arch/arm64/mm/cache.S|  1 -
>  arch/arm64/mm/context.c  | 12 +
>  8 files changed, 19 insertions(+), 46 deletions(-)
>  delete mode 100644 arch/arm64/include/asm/asm-uaccess.h
> 
> diff --git a/arch/arm64/include/asm/asm-uaccess.h 
> b/arch/arm64/include/asm/asm-uaccess.h
> deleted file mode 100644
> index 8f763e5b41b1..
> --- a/arch/arm64/include/asm/asm-uaccess.h
> +++ /dev/null
> @@ -1,38 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef __ASM_ASM_UACCESS_H
> -#define __ASM_ASM_UACCESS_H
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -/*
> - * User access enabling/disabling macros.
> - */
> -#ifdef CONFIG_ARM64_SW_TTBR0_PAN
> - .macro  __uaccess_ttbr0_disable, tmp1
> - mrs \tmp1, ttbr1_el1// swapper_pg_dir
> - bic \tmp1, \tmp1, #TTBR_ASID_MASK
> - sub \tmp1, \tmp1, #RESERVED_TTBR0_SIZE  // reserved_ttbr0 just 
> before swapper_pg_dir
> - msr ttbr0_el1, \tmp1// set reserved 
> TTBR0_EL1
> - isb
> - add \tmp1, \tmp1, #RESERVED_TTBR0_SIZE
> - msr ttbr1_el1, \tmp1// set reserved ASID
> - isb
> - .endm
> -
> - .macro  __uaccess_ttbr0_enable, tmp1, tmp2
> - get_current_task \tmp1
> - ldr \tmp1, [\tmp1, #TSK_TI_TTBR0]   // load saved TTBR0_EL1
> - mrs \tmp2, ttbr1_el1
> - extr\tmp2, \tmp2, \tmp1, #48
> - ror \tmp2, \tmp2, #16
> - msr ttbr1_el1, \tmp2// set the active ASID
> - isb
> - msr ttbr0_el1, \tmp1// set the non-PAN TTBR0_EL1
> - isb
> - .endm
> -#endif
> -#endif
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 583f71abbe98..c7b571e6d0f2 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -22,8 +22,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> -#include 
>  #include 
>  
>  /*
> @@ -219,7 +219,7 @@ alternative_else_nop_endif
>   and x23, x23, #~PSR_PAN_BIT // Clear the emulated PAN in 
> the saved SPSR
>   .endif
>  
> - __uaccess_ttbr0_disable x21
> + bl __uaccess_ttbr0_disable_c
>  1:
>  #endif
>  
> @@ -293,7 +293,7 @@ alternative_else_nop_endif
>   tbnzx22, #22, 1f// Skip re-enabling TTBR0 
> access if the PSR_PAN_BIT is set
>   .endif
>  
> - __uaccess_ttbr0_enable x0, x1
> + bl  __uaccess_ttbr0_enable_c
>  
>   .if \el == 0
>   /*
> diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
> index aeafc03e961a..b0b4a86a09e2 100644
> --- a/arch/arm64/lib/clear_user.S
> +++ b/arch/arm64/lib/clear_user.S
> @@ -6,7 +6,7 @@
>   */
>  #include 
>  
> -#include 
> +#include 
>  #include 
>  
>   .text
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index ebb3c06cbb5d..142bc7505518 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -5,7 +5,7 @@
>  
>  #include 
>  
> -#include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S
> index 3d8153a1ebce..04dc48ca26f7 100644
> --- a/arch/arm64/lib/copy_in_user.S
> +++ b/arch/arm64/lib/copy_in_user.S
> @@ -7,7 +7,7 @@
>  
>  #include 
>  
> -#include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
> index 357eae2c18eb..8f3218ae88ab 100644
> --- a/arch/arm64/lib/copy_to_user.S
> +++ b/arch/arm64/lib/copy_to_user.S
> @@ -5,7 +5,7 @@
>  
>  #include 
>  
> -#include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 408d317a47d2..7940d6ef5da5 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -12,7 +12,6 @@
>  #include 
>  #include 
>  #include 
> 

Re: [Xen-devel] [PATCH v2 2/3] arm64: remove uaccess_ttbr0 asm macros from cache functions

2019-11-27 Thread Mark Rutland
Hi Pavel,

On Thu, Nov 21, 2019 at 09:24:05PM -0500, Pavel Tatashin wrote:
> Replace the uaccess_ttbr0_disable/uaccess_ttbr0_enable via
> inline variants, and remove asm macros.

A commit message should provide rationale, rather than just a
description of the patch. Something like:

| We currently duplicate the logic to enable/disable uaccess via TTBR0,
| with C functions and assembly macros. This is a maintenenace burden
| and is liable to lead to subtle bugs, so let's get rid of the assembly
| macros, and always use the C functions. This requires refactoring
| some assembly functions to have a C wrapper.

[...]

> +static inline int invalidate_icache_range(unsigned long start,
> +   unsigned long end)
> +{
> + int rv;
> +#if ARM64_HAS_CACHE_DIC
> + rv = arch_invalidate_icache_range(start, end);
> +#else
> + uaccess_ttbr0_enable();
> + rv = arch_invalidate_icache_range(start, end);
> + uaccess_ttbr0_disable();
> +#endif
> + return rv;
> +}

This ifdeffery is not the same as an alternative_if, and even if it were
the ARM64_HAS_CACHE_DIC behaviour is not the same as the existing
assembly.

This should be:

static inline int invalidate_icache_range(unsigned long start,
  unsigned long end)
{
int ret;

if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) {
isb();
return 0;
}

uaccess_ttbr0_enable();
ret = arch_invalidate_icache_range(start, end);
uaccess_ttbr0_disable();

return ret;
}

The 'arch_' prefix should probably be 'asm_' (or have an '_asm' suffix),
since this is entirely local to the arch code, and even then should only
be called from the C wrappers.

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] ARM: xen: unexport HYPERVISOR_platform_op function

2019-10-01 Thread Mark Rutland
On Tue, Oct 01, 2019 at 03:39:41PM +0100, Julien Grall wrote:
> On 01/10/2019 15:33, Mark Rutland wrote:
> > On Sat, Sep 07, 2019 at 11:05:45AM +0100, Julien Grall wrote:
> > > On 9/6/19 6:20 PM, Andrew Cooper wrote:
> > > > On 06/09/2019 17:00, Arnd Bergmann wrote:
> > > > > On Fri, Sep 6, 2019 at 5:55 PM Andrew Cooper 
> > > > >  wrote:
> > > > > > On 06/09/2019 16:39, Arnd Bergmann wrote:
> > > > > > > HYPERVISOR_platform_op() is an inline function and should not
> > > > > > > be exported. Since commit 15bfc2348d54 ("modpost: check for
> > > > > > > static EXPORT_SYMBOL* functions"), this causes a warning:
> > > > > > > 
> > > > > > > WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static 
> > > > > > > EXPORT_SYMBOL_GPL
> > > > > > > 
> > > > > > > Remove the extraneous export.
> > > > > > > 
> > > > > > > Fixes: 15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* 
> > > > > > > functions")
> > > > > > > Signed-off-by: Arnd Bergmann 

[...]

> > > While looking at the code, I also realized that the implementation of
> > > HYPERCALL_dm_op might be incorrect for Arm32. Similarly do privcmd call, I
> > > think dm_op call should enable user access as they will be used by
> > > userspace.
> > > 
> > > We don't use dm_op on Arm so far, hence why I think this was unnoticed. I
> > > will see if I can reproduce it and send a patch.
> > 
> > I'm seeing this when building arm64 defconfig v5.4-rc1:
> > 
> > | [mark@lakrids:~/src/linux]% usekorg 8.1.0  make ARCH=arm64 
> > CROSS_COMPILE=aarch64-linux- -j56 -s
> > | arch/arm64/Makefile:62: CROSS_COMPILE_COMPAT not defined or empty, the 
> > compat vDSO will not be built
> > | WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL
> > | WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL
> > 
> > I couldn't see a follow-up; do you have a patch for this?
> 
> The first e-mail of the thread should contain a patch to address the warning
> (see [1]). But it is still waiting on an Ack from Stefano so it can get
> merged.

Ah, sorry. I misunderstood what you were planning to send a patch for,
and assumed you were going to propose an alternative to Arnd's patch.

Stefano, do you see any problem with Arnd's patch? If not, it would be
good to get this merged soon.

Thanks,
Mark.

> 
> Cheers,
> 
> [1] https://patchwork.kernel.org/patch/11135601/
> 
> -- 
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] ARM: xen: unexport HYPERVISOR_platform_op function

2019-10-01 Thread Mark Rutland
Hi Julien,

On Sat, Sep 07, 2019 at 11:05:45AM +0100, Julien Grall wrote:
> On 9/6/19 6:20 PM, Andrew Cooper wrote:
> > On 06/09/2019 17:00, Arnd Bergmann wrote:
> > > On Fri, Sep 6, 2019 at 5:55 PM Andrew Cooper  
> > > wrote:
> > > > On 06/09/2019 16:39, Arnd Bergmann wrote:
> > > > > HYPERVISOR_platform_op() is an inline function and should not
> > > > > be exported. Since commit 15bfc2348d54 ("modpost: check for
> > > > > static EXPORT_SYMBOL* functions"), this causes a warning:
> > > > > 
> > > > > WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static 
> > > > > EXPORT_SYMBOL_GPL
> > > > > 
> > > > > Remove the extraneous export.
> > > > > 
> > > > > Fixes: 15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* 
> > > > > functions")
> > > > > Signed-off-by: Arnd Bergmann 
> > > > Something is wonky.  That symbol is (/ really ought to be) in the
> > > > hypercall page and most definitely not inline.
> > > > 
> > > > Which tree is that changeset from?  I can't find the SHA.
> > > This is from linux-next, I think from the kbuild tree.
> > 
> > Thanks.
> > 
> > Julien/Stefano: Why are any of these hypercalls out-of-line?  ARM
> > doesn't use the hypercall page, and there is no argument translation
> > (not even in arm32 as there are no 5-argument hypercalls declared).
> 
> I am not sure how the hypercall page makes things different. You still have
> to store the arguments in the correct register so...
> 
> > 
> > They'd surely be easier to implement with a few static inlines and some
> > common code, than to try and replicate the x86 side hypercall_page
> > interface ?
> 
> ... I don't think they will be easier to implement with a few static
> inlines. The implementation will likely end up to be similar to
> arch/x86/asm/xen/hypercall.h.
> 
> Furthermore, one of the downside of per-arch static inline is it is more
> difficult to ensure the prototype match for all the architectures. Although,
> it might be possible to make them common by only requesting per-arch to
> implement HYPERCALL_N(...).
> 
> So I think the code is better as it is.
> 
> While looking at the code, I also realized that the implementation of
> HYPERCALL_dm_op might be incorrect for Arm32. Similarly do privcmd call, I
> think dm_op call should enable user access as they will be used by
> userspace.
> 
> We don't use dm_op on Arm so far, hence why I think this was unnoticed. I
> will see if I can reproduce it and send a patch.

I'm seeing this when building arm64 defconfig v5.4-rc1:

| [mark@lakrids:~/src/linux]% usekorg 8.1.0  make ARCH=arm64 
CROSS_COMPILE=aarch64-linux- -j56 -s
| arch/arm64/Makefile:62: CROSS_COMPILE_COMPAT not defined or empty, the compat 
vDSO will not be built
| WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL
| WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL

I couldn't see a follow-up; do you have a patch for this?

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel