Re: [PATCH] acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()
On Mon, Nov 20, 2023 at 1:20 PM David Woodhouse wrote: > > On Fri, 2023-10-27 at 21:14 +0200, Peter Zijlstra wrote: > > On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote: > > > From: David Woodhouse > > > > > > Xen HVM guests were observed taking triple-faults when attempting to > > > online a previously offlined vCPU. > > > > > > Investigation showed that the fault was coming from a failing call > > > to lockdep_assert_irqs_disabled(), in load_current_idt() which was > > > too early in the CPU bringup to actually catch the exception and > > > report the failure cleanly. > > > > > > This was a false positive, caused by acpi_idle_play_dead() setting > > > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it > > > to use raw_safe_halt() instead, which doesn't do so. > > > > > > Signed-off-by: David Woodhouse > > > --- > > > We might {also,instead} explicitly set the hardirqs_enabled flag to > > > zero when bringing up an AP? > > > > So I fixed up the idle paths the other day (see all that __cpuidle > > stuff) but I've not yet gone through the whole hotplug thing :/ > > > > This seems right, at this point everything, including RCU is very much > > gone, any instrumentation is undesired. > > > > Acked-by: Peter Zijlstra (Intel) > > Ping? Who's taking this? I'm going to apply it. > Needs a Cc:sta...@vger.kernel.org now too, to fix 6.6.x. Sure.
Re: [PATCH] acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()
On Fri, 2023-10-27 at 21:14 +0200, Peter Zijlstra wrote: > On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote: > > From: David Woodhouse > > > > Xen HVM guests were observed taking triple-faults when attempting to > > online a previously offlined vCPU. > > > > Investigation showed that the fault was coming from a failing call > > to lockdep_assert_irqs_disabled(), in load_current_idt() which was > > too early in the CPU bringup to actually catch the exception and > > report the failure cleanly. > > > > This was a false positive, caused by acpi_idle_play_dead() setting > > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it > > to use raw_safe_halt() instead, which doesn't do so. > > > > Signed-off-by: David Woodhouse > > --- > > We might {also,instead} explicitly set the hardirqs_enabled flag to > > zero when bringing up an AP? > > So I fixed up the idle paths the other day (see all that __cpuidle > stuff) but I've not yet gone through the whole hotplug thing :/ > > This seems right, at this point everything, including RCU is very much > gone, any instrumentation is undesired. > > Acked-by: Peter Zijlstra (Intel) Ping? Who's taking this? Needs a Cc:sta...@vger.kernel.org now too, to fix 6.6.x. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()
On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote: > From: David Woodhouse > > Xen HVM guests were observed taking triple-faults when attempting to > online a previously offlined vCPU. > > Investigation showed that the fault was coming from a failing call > to lockdep_assert_irqs_disabled(), in load_current_idt() which was > too early in the CPU bringup to actually catch the exception and > report the failure cleanly. > > This was a false positive, caused by acpi_idle_play_dead() setting > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it > to use raw_safe_halt() instead, which doesn't do so. > > Signed-off-by: David Woodhouse > --- > We might {also,instead} explicitly set the hardirqs_enabled flag to > zero when bringing up an AP? So I fixed up the idle paths the other day (see all that __cpuidle stuff) but I've not yet gone through the whole hotplug thing :/ This seems right, at this point everything, including RCU is very much gone, any instrumentation is undesired. Acked-by: Peter Zijlstra (Intel) > > drivers/acpi/processor_idle.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c > index 3a34a8c425fe..55437f5e0c3a 100644 > --- a/drivers/acpi/processor_idle.c > +++ b/drivers/acpi/processor_idle.c > @@ -592,7 +592,7 @@ static int acpi_idle_play_dead(struct cpuidle_device > *dev, int index) > while (1) { > > if (cx->entry_method == ACPI_CSTATE_HALT) > - safe_halt(); > + raw_safe_halt(); > else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) { > io_idle(cx->address); > } else > -- > 2.41.0 > >
[PATCH] acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()
From: David Woodhouse Xen HVM guests were observed taking triple-faults when attempting to online a previously offlined vCPU. Investigation showed that the fault was coming from a failing call to lockdep_assert_irqs_disabled(), in load_current_idt() which was too early in the CPU bringup to actually catch the exception and report the failure cleanly. This was a false positive, caused by acpi_idle_play_dead() setting the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it to use raw_safe_halt() instead, which doesn't do so. Signed-off-by: David Woodhouse --- We might {also,instead} explicitly set the hardirqs_enabled flag to zero when bringing up an AP? drivers/acpi/processor_idle.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 3a34a8c425fe..55437f5e0c3a 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -592,7 +592,7 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index) while (1) { if (cx->entry_method == ACPI_CSTATE_HALT) - safe_halt(); + raw_safe_halt(); else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) { io_idle(cx->address); } else -- 2.41.0 smime.p7s Description: S/MIME cryptographic signature