> On Aug 26, 2025, at 10:58 PM, matthew green <[email protected]> wrote:
>
>>> CPU: 0.0% user, 47.0% nice, 3.0% system, 0.1% interrupt, 49.8% idle
>
> can you press 't' at top screen, so it shows both cpus separately?
>
> ie, monitor if it is one cpu stuck, or moves between, etc.?
In his original post, John said he was able to force processes to migrate using
cpuctl, so I doubt one of the CPUs is “stuck”.
It’s been a while since I inspected the scheduler code (my recollection
predates when the “1st class” / SMT logic was added), so I refreshed my memory
over my morning coffee today.
Generally, the scheduler tries to keep an LWP on the CPU it last ran on,
because that improves cache locality. Obviously, under duress, the scheduler
will decide to migrate to another CPU, but I don’t remember the criteria (guess
I'll need another cup of coffee).
My wild-a** guess is that the initial CPU selection for the new LWP is not
working as expected. There seems to be some logic specifically around getting
a newly-spawned-by-the-shell process off onto a different CPU
(sched_vforkexec() / LP_TELEPORT). I’m concerned that this is insufficient;
wouldn’t you want the same logic to apply to a new process created by
posix_spawn()? I don’t know that’s what the issue is here, but I’m hoping to
provoke a discussion so that people who’ve been in this code more recently than
me can explain how it’s supposed to work so we can reason about what going
wrong for John.
-- thorpej