On Mon, 11 Sep 2023, Anthony Chan wrote:
> On Wed, 6 Sep 2023, Stefano Stabellini wrote:
> > On Wed, 6 Sep 2023, Anthony Chan wrote:
> > > Thanks, I've tried patches that stemmed from that discussion but
> > > unfortunately, doesn't resolve the issue.  In fact, the s2idle_loop
> > > branch might not be the problem at all.  I experimented with Xen to allow 
> > > the
> > > 'idle-states' into the FDT and prevented xen_guest_init on Linux from 
> > > disabling
> > > the 'cpuidle' driver (arch/arm/xen/enlighten.c).  When I trigger a 
> > > suspend, I
> > > can see now another thread (believe it's the idle thread) call into
> > > drivers/firmware/psci/psci.c:__psci_cpu_suspend and then the Xen
> > > counterpart at xen/arch/arm/vpsci.c:do_psci_0_2_cpu_suspend.
> >
> > OK but remember that Xen is not implementing do_psci_0_2_cpu_suspend
> > correctly at the moment. Either we need to fix the Xen implementation, or we
> > need to configure Linux so that it calls WFI instead of __psci_cpu_suspend.
> >
> > As a test, can you try to apply the attached patch to Xen as a tenative 
> > fix?  Or
> > you could change drivers/firmware/psci/psci.c:__psci_cpu_suspend to call
> > WFI instead of the PSCI operation (making sure to go to the entry_point
> > instead of returning).
> 
> Tried the patch and substituting a WFI for a PSCI op, but Xen still watchdogs 
> on the VMs in both cases.  I noticed the other Linux generic arm 'cpu-idle' 
> driver which used to do issue a WFI/cpu_do_idle isn't useable anymore either. 
>  I'm not sure if Xen may have used to rely on this generic driver to get the 
> WFI.

I was running out of ideas so I went back to look at the watchdog
console log:

(XEN) do_psci_0_2_cpu_suspend
(XEN) Watchdog timer fired for domain 0
(XEN) Hardware Dom0 shutdown: watchdog rebooting machine

Checking the code, it seems that the Xen watchdog is set by
xen/common/sched/core.c:SCHEDOP_watchdog, which is called by
tools/libs/ctrl/xc_domain.c:xc_watchdog.

xc_watchdog is called by tools/misc/xenwatchdogd.c. Is it possible that
this problem is entirely caused by the daemon xenwatchdogd running in
the background? What happens if you kill xenwatchdogd and try again
without it (even better not start it at all)?

Reply via email to