On Wed, 6 Sep 2023, Anthony Chan wrote:
> On Tue, 05 Sep 2023, Stefano Stabellini wrote:
> > On Thu, 31 Aug 2023, Anthony Chan wrote:
> > > On Thu, 30 Aug 2023, Stefano Stabellini wrote:
> > > > On Wed, 30 Aug 2023, Anthony Chan wrote:
> > > > > On Tue, 29 Aug 2023, Stefano Stabellini wrote:
> > > > > > On Tue, 29 Aug 2023, Anthony Chan wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > My name is Tony and I've been researching/developing using
> > > > > > > Xen for potential upcoming uses in our embedded systems. I
> > > > > > > started with Xen using Xilinx tools about a year ago and
> > > > > > > still have lots to learn about what it can to do in the
> > > > > > > embedded space. So far, I've managed to integrate Xen and
> > > > > > > Linux into an existing product that exclusively runs
> > > > > > > bare-metal code on a ZynqMP SoC and migrate some of the
> > > > > > > functionality into custom Linux driver/userspace.
> > > > > > >
> > > > > > > I'm now looking at low power support, for now at least
> > > > > > > between Xen
> > > > > > > (4.16) and Linux (5.15) dom0. I've tried a few different
> > > > > > > Linux kernel configs around power management and each time I
> > > > > > > try to suspend from linux dom0 (via sysfs or systemctl), Xen
> > > > > > > will watchdog on dom0 guest.
> > > > > > > AFAIK, Xen should trap on a 'WFI' from guests, but from what
> > > > > > > I can tell debugging through the linux suspend process is
> > > > > > > it's spinning in a 'suspend- to-idle' loop before it can get
> > > > > > > to issuing a 'WFI' or using PSCI interface to notify Xen.
> > > > > > > I'm beginning to suspect that 'low power' support for embedded
> > > > > > > arm64 just isn't quite there yet, or am I missing something
> > > > > > > in the configs?
> > > > > > >
> > > > > > > I realize this could very well be a Linux 'issue' but
> > > > > > > checking here first. I know Xen presents a flattened device
> > > > > > > tree to Linux without CPU idle-state nodes and maybe this is
> > > > > > > causing the linux guest to only do the suspend- to-idle
> > > > > > > mode? I should mention that I'm booting up using dom0less
> > > > > > > feature if that matters.
> > > > > >
> > > > > >
> > > > > > Hi Anthony,
> > > > > >
> > > > > > Assuming you are using the default Xen command line parameters
> > > > > > for Xilinx boards: sched=null vwfi=native, then if the guest
> > > > > > uses WFI, the CPU will execute WFI directly and go into low
> > > > > > power mode.
> > > > > Yes, using these command line params.
> > > > >
> > > > > > Given the issue you are describing, I am suspecting the guest
> > > > > > is not issuing
> > > > > > WFI: that is simple and known to work. Instead, I suspect that
> > > > > > Linux might be trying to use PSCI_suspend in a way that is not
> > > > > > supported or well- implemented by Xen.
> > > > > >
> > > > > > Can you check? You can add a printk in Linux
> > > > > > drivers/firmware/psci/psci.c:__psci_cpu_suspend or in Xen
> > > > > > xen/arch/arm/vpsci.c:do_psci_0_2_cpu_suspend
> > > > > Instrumented both places it doesn't appear to reach there. In
> > > > > kernel/power/suspend.c, there's a call to s2idle_loop that it's
> > > > > currently 'stuck' in and I think it doesn't get to the psci suspend
> > > > > your
> > > > > referring till afterwards, when suspend_ops->enter is called.
> > > > > Unfortunately, without any idle-states nodes in the FDT, the
> > > > > only suspend state Linux is defaults to is 'suspend to idle'.
> > > >
> > > > The fact that Linux uses "suspend to idle" is not a problem
> > > > because as I mentioned WFI or PSCI_suspent are not different on
> > > > Xen. That part is OK.
> > > What if using "suspend to idle" is preventing a WFI/PSCI_suspend?
> > > Which is what I believe I'm currently seeing in my setup. In
> > > kernel/power/suspend.c, suspend_devices_and_enter(), it gets into
> > > the this s2idle_loop and upon resuming from idle, it jumps past the
> > > point where I believe a WFI/PSCI_suspend can happen.
> > > if (state == PM_SUSPEND_TO_IDLE) {
> > > s2idle_loop();
> > > goto Platform_wake;
> > > }
> >
> > If that is the case, then it looks like a Linux bug. Maybe something
> > along these lines?
> >
> > https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .k%2F&data=05%7C01%7Canthonychan%40nureva.com%7Ccf5e956c6a4d42b1460a08
> > dbaf13e021%7C5aeb77fa643b473eaee0cb54a11ccba3%7C1%7C0%7C63829627212319
> > 7144%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBT
> > iI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lDfILPgJQQQPJOL44%2BcU
> > %2BN2RFOs8L6F4AE11kvOhOQE%3D&reserved=0
> > ernel.org%2Flinux-arm-
> > kernel%2F4665489.GXAFRqVoOG%40kreacher%2FT%2F%23m6edda92d0b5
> > dc09f8e05e7d6db3807501b7249f4&data=05%7C01%7Canthonychan%40n
> > ureva.com%7C144641906bef48b9180f08dbae7bc1d8%7C5aeb77fa643b47
> > 3eaee0cb54a11ccba3%7C1%7C0%7C638295618800415028%7CUnknown
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Fv%2FHpI1ix3yhDJ
> > 75oejWzDfUqB8SqOQzkU3clJzMOtw%3D&reserved=0
>
> Thanks, I've tried patches that stemmed from that discussion but
> unfortunately,
> doesn't resolve the issue. In fact, the s2idle_loop branch might not be the
> problem at all. I experimented with Xen to allow the 'idle-states' into the
> FDT and prevented xen_guest_init on Linux from disabling the 'cpuidle' driver
> (arch/arm/xen/enlighten.c). When I trigger a suspend, I can see now another
> thread (believe it's the idle thread) call into
> drivers/firmware/psci/psci.c:__psci_cpu_suspend and then the Xen counterpart
> at
> xen/arch/arm/vpsci.c:do_psci_0_2_cpu_suspend.
OK but remember that Xen is not implementing do_psci_0_2_cpu_suspend
correctly at the moment. Either we need to fix the Xen implementation,
or we need to configure Linux so that it calls WFI instead of
__psci_cpu_suspend.
As a test, can you try to apply the attached patch to Xen as a tenative
fix? Or you could change
drivers/firmware/psci/psci.c:__psci_cpu_suspend to call WFI instead of
the PSCI operation (making sure to go to the entry_point instead of
returning).
> The normal 'suspend' thread still goes into the s2idle_loop.
This is potentially a problem: if it is s2idle_loop causing the problem,
and one thread is still executing it, then we can still have a bug?
> Eventually, Xen still watchdogs on the dom0 VM.
diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
index d1615be8a6..4ca1d7c48f 100644
--- a/xen/arch/arm/vpsci.c
+++ b/xen/arch/arm/vpsci.c
@@ -128,6 +128,8 @@ static register_t do_psci_0_2_cpu_suspend(uint32_t power_state,
*/
vcpu_block_unless_event_pending(v);
+ v->arch.cpu_info->guest_cpu_user_regs.pc = (u64) entry_point;
+ v->arch.cpu_info->guest_cpu_user_regs.x0 = context_id;
return PSCI_SUCCESS;
}