On Thu, Jan 22, 2026 at 11:31:52AM +0100, Jan Beulich wrote:
> On 22.01.2026 09:50, Roger Pau Monné wrote:
> > On Mon, Nov 17, 2025 at 03:37:45PM +0100, Jan Beulich wrote:
> >> If already we play with the IRQ count, we should do so only if we actually
> >> "consume" the interrupt; normal timer IRQs should not have any adjustment
> >> done.
> >>
> >> Fixes: 353533232730 ("cpuidle: fix the menu governor to enhance IO
> >> performance")
> >> Signed-off-by: Jan Beulich <[email protected]>
> >> ---
> >> _Why_ we do these adjustments (also elsewhere) I don't really know.
> >
> > I think I have an idea of what's going on here. This accounting is
> > used by the idle governor to decide when to go idle. On Linux (where
> > the code is imported from) the governor took into account the inflight
> > IO request state. However that's not available to Xen and instead
> > they decided to mimic the tracking of the IO activity by counting
> > interrupts. I bet then realized the timer interrupt would "skew"
> > those results and make it look like there's IO activity when the
> > system is otherwise mostly idle.
>
> Hmm, yes, that sounds pretty plausible. Except for one aspect: Why would
> it be I/O that the governor would care about?
This is all hypothetical, I don't know the real reasons. I think they
aimed to avoid putting the system in deep idle states if there's IO
gong on, regardless of whether the CPU is otherwise idle. Putting the
system in those deeper idle states would also increase interrupt
latency.
I'm not arguing the initial purpose was correct, just attempting to
make sense of all of this.
> It wants to judge by the
> system being busy, and timer interrupts generally are an indication of
> busyness. Just not broadcast ones. Hence ...
>
> >> --- a/xen/arch/x86/hpet.c
> >> +++ b/xen/arch/x86/hpet.c
> >> @@ -808,13 +808,13 @@ int hpet_broadcast_is_available(void)
> >>
> >> int hpet_legacy_irq_tick(void)
> >> {
> >> - this_cpu(irq_count)--;
> >
> > I think you want to pull this decrease into timer_interrupt() itself,
> > so it does the decrease unconditionally of whether the interrupt is a
> > legacy HPET one or from the PIT?
>
> ... I think moving to timer_interrupt() would actually be wrong.
Hm, I see. It's only HPET broadcast we want to avoid accounting for.
> > By gating the decrease on the interrupt having been originated from
> > the HPET you completely avoid the decrease in the PIT case AFAICT.
> >
> >> -
> >> if ( !hpet_events ||
> >> (hpet_events->flags & (HPET_EVT_DISABLE|HPET_EVT_LEGACY)) !=
> >> HPET_EVT_LEGACY )
> >> return 0;
> >>
> >> + this_cpu(irq_count)--;
> >
> > Also in hpet_interrupt_handler() we might consider only doing the
> > decrease after we ensure it's not a spurious interrupt? We don't seem
> > to decrease irq_count for spurious interrupts elsewhere.
>
> Even a spurious interrupt is only an idle management auxiliary one (i.e.
> really an artifact thereof). It doesn't hint at the system being busy.
Right, I was mislead and somehow assumed the intent was to avoid this
counting for all timer interrupts. Instead is just the HPET broadcast
that not accounted for.
Reviewed-by: Roger Pau Monné <[email protected]>
Thanks, Roger.