On 12/08/23(Sat) 11:48, Visa Hankala wrote:
> On Sat, Aug 12, 2023 at 01:29:10PM +0200, Martin Pieuchot wrote:
> > On 12/08/23(Sat) 10:57, Visa Hankala wrote:
> > > On Fri, Aug 11, 2023 at 09:52:15PM +0200, Martin Pieuchot wrote:
> > > > When stopping a machine, with "halt -p" for example, secondary CPUs are
> > > > removed from the scheduler before smr_flush() is called. So there's no
> > > > need for the SMR thread to peg itself to such CPUs. This currently
> > > > isn't a problem because we use per-CPU runqueues but it doesn't work
> > > > with a global one. So the diff below skip halted CPUs. It should also
> > > > speed up rebooting/halting on machine with a huge number of CPUs.
> > >
> > > Because SPCF_HALTED does not (?) imply that the CPU has stopped
> > > processing interrupts, this skipping is not safe as is. Interrupt
> > > handlers might access SMR-protected data.
> >
> > Interesting. This is worse than I expected. It seems we completely
> > forgot about suspend/resume and rebooting when we started pinning
> > interrupts on secondary CPUs, no? Previously sched_stop_secondary_cpus()
> > was enough to ensure no more code would be executed on secondary CPUs,
> > no? Wouldn't it be better to remap interrupts to the primary CPU in
> > those cases? Is it easily doable?
>
> I think device interrupt stopping already happens through
> config_suspend_all().
Indeed. I'm a bit puzzled about the order of operations though. In the
case of reboot/halt the existing order of operations are:
sched_stop_secondary_cpus() <--- remove secondary CPUs from the scheduler
vfs_shutdown()
if_downall()
uvm_swap_finicrypt_all() <--- happens on a single CPU but with
interrupts possibly on secondary CPUs
smr_flush() <--- tells the SMR thread to execute itself
on all CPUs even if they are out of the
scheduler
config_suspend_all() <--- stop interrupts from firing
x86_broadcast_ipi(X86_IPI_HALT) <--- stop secondary CPUs (on x86)
So do we want to keep the existing requirement of being able to execute
a thread on a CPU that has been removed from the scheduler? That's is
what smr_flush() currently needs. I find it surprising but I can add
that as a requirement for the upcoming scheduler. I don't know if other
options are possible or even attractive.