On Fri, Aug 11, 2023 at 09:52:15PM +0200, Martin Pieuchot wrote:
> When stopping a machine, with "halt -p" for example, secondary CPUs are
> removed from the scheduler before smr_flush() is called. So there's no
> need for the SMR thread to peg itself to such CPUs. This currently
> isn't a problem because we use per-CPU runqueues but it doesn't work
> with a global one. So the diff below skip halted CPUs. It should also
> speed up rebooting/halting on machine with a huge number of CPUs.
Because SPCF_HALTED does not (?) imply that the CPU has stopped
processing interrupts, this skipping is not safe as is. Interrupt
handlers might access SMR-protected data.
One possible solution is to spin. When smr_grace_wait() sees
SPCF_HALTED, it should probably call cpu_unidle(ci) and spin until
condition READ_ONCE(ci->ci_schedstate.spc_smrgp) == smrgp becomes true.
However, for this to work, sched_idle() needs to invoke smr_idle().
Here is a potential problem since the cpu_idle_{enter,cycle,leave}()
logic is not consistent between architectures.
I think the intent in sched_idle() was that cpu_idle_enter() should
block interrupts so that sched_idle() could check without races if
the CPU can sleep. Now, on some architectures cpu_idle_enter() is
a no-op. These architectures have to check the idle state in their
cpu_idle_cycle() function before pausing the CPU.
To avoid touching architecture-specific code, cpu_is_idle() could
be redefined to
((ci)->ci_schedstate.spc_whichqs == 0 &&
(ci)->ci_schedstate.spc_smrgp == READ_ONCE(smr_grace_period))
Then the loop conditions
while (!cpu_is_idle(curcpu())) {
and
while (spc->spc_whichqs == 0) {
in sched_idle() would have to be changed to
while (spc->spc_whichqs != 0) {
and
while (cpu_is_idle(ci)) {
:(
> Index: kern/kern_smr.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_smr.c,v
> retrieving revision 1.16
> diff -u -p -r1.16 kern_smr.c
> --- kern/kern_smr.c 14 Aug 2022 01:58:27 -0000 1.16
> +++ kern/kern_smr.c 11 Aug 2023 19:43:54 -0000
> @@ -158,6 +158,8 @@ smr_grace_wait(void)
> CPU_INFO_FOREACH(cii, ci) {
> if (!CPU_IS_RUNNING(ci))
> continue;
> + if (ci->ci_schedstate.spc_schedflags & SPCF_HALTED)
> + continue;
> if (READ_ONCE(ci->ci_schedstate.spc_smrgp) == smrgp)
> continue;
> sched_peg_curproc(ci);
>