On Mon, Jul 10, 2023 at 07:42:55AM -0600, Theo de Raadt wrote:
> I dare you to write the simplest fix for this, instead of a diff that
> scrolls by.
This patch seems to work. Will need to bang on it for a few more days.
1. Disable gmoninit after sched_stop_scondary_cpus(). The secondary
CPUs have halted, so we aren't racing sysctl(2) on a secondary CPU.
2. Restore gmoninit between resume_mp() and sched_start_secondary_cpus().
The secondary CPUs are out of cpu_hatch(), which is probably where we
are crashing during resume. The secondary CPUs haven't started
scheduling yet, so we aren't racing sysctl(2).
Index: subr_suspend.c
===================================================================
RCS file: /cvs/src/sys/kern/subr_suspend.c,v
retrieving revision 1.15
diff -u -p -r1.15 subr_suspend.c
--- subr_suspend.c 2 Jul 2023 19:02:27 -0000 1.15
+++ subr_suspend.c 10 Jul 2023 14:51:01 -0000
@@ -18,6 +18,7 @@
#include <sys/param.h>
#include <sys/systm.h>
+#include <sys/atomic.h>
#include <sys/buf.h>
#include <sys/clockintr.h>
#include <sys/reboot.h>
@@ -100,6 +101,14 @@ top:
#ifdef MULTIPROCESSOR
sched_stop_secondary_cpus();
KASSERT(CPU_IS_PRIMARY(curcpu()));
+#endif
+#ifdef GPROF
+ extern int gmoninit;
+ int gmon_state = gmoninit;
+ gmoninit = 0;
+ membar_producer();
+#endif
+#ifdef MULTIPROCESSOR
sleep_mp();
#endif
@@ -172,6 +181,12 @@ fail_suspend:
resume_randomness(rndbuf, rndbuflen);
#ifdef MULTIPROCESSOR
resume_mp();
+#endif
+#ifdef GPROF
+ gmoninit = gmon_state;
+ membar_producer();
+#endif
+#ifdef MULTIPROCESSOR
sched_start_secondary_cpus();
#endif
vfs_stall(curproc, 0);