On Sun, Jul 09, 2023 at 08:11:43PM +0200, Claudio Jeker wrote:
> On Sun, Jul 09, 2023 at 12:52:20PM -0500, Scott Cheloha wrote:
> > This patch fixes resume/unhibernate on GPROF kernels where kgmon(8)
> > has activated kernel profiling.
> >
> > I think the problem is that code called from cpu_hatch() does not play
> > nicely with _mcount(), so GPROF kernels crash during resume. I can't
> > point you to which code in particular, but keeping all CPUs out of
> > _mcount() until the primary CPU has completed resume/unhibernate fixes
> > the crash.
> >
> > ok?
>
> To be honest, I'm not sure we need something like this. GPROF is already a
> special case and poeple running a GPROF kernel should probably stop the
> collection of profile data before suspend/hibernate.
Sorry, I was a little unclear in my original mail.
When I say "has activated kernel profiling" I mean "has *ever*
activated kernel profiling".
Regardless of whether or not profiling is active at the moment we
reach sleep_state(), if kernel profiling has *ever* been activated in
the past, the resume crashes.
When sysctl(2) reaches sysctl_doprof(), gmoninit is set. "Assume that
if we're here it is safe to execute profiling":
166 /*
167 * Return kernel profiling information.
168 */
169 int
170 sysctl_doprof(int *name, u_int namelen, void *oldp, size_t *oldlenp,
void *newp,
171 size_t newlen)
172 {
173 CPU_INFO_ITERATOR cii;
174 struct cpu_info *ci;
175 struct gmonparam *gp = NULL;
176 int error, cpuid, op, state;
177
178 /* all sysctl names at this level are name and field */
179 if (namelen != 2)
180 return (ENOTDIR); /* overloaded */
181
182 op = name[0];
183 cpuid = name[1];
184
185 CPU_INFO_FOREACH(cii, ci) {
186 if (cpuid == CPU_INFO_UNIT(ci)) {
187 gp = ci->ci_gmon;
188 break;
189 }
190 }
191
192 if (gp == NULL)
193 return (EOPNOTSUPP);
194
195 /* Assume that if we're here it is safe to execute profiling. */
196 gmoninit = 1;
After that first sysctl(2), all CPUs will stop bouncing out of
_mcount() at the gmoninit check and starting checking per-CPU data
structures to decide whether or not to record the arc. "Do not
profile execution...".
73 _MCOUNT_DECL(u_long frompc, u_long selfpc) __used;
74 /* _mcount; may be static, inline, etc */
75 _MCOUNT_DECL(u_long frompc, u_long selfpc)
76 {
77 u_short *frompcindex;
78 struct tostruct *top, *prevtop;
79 struct gmonparam *p;
80 long toindex;
81 #ifdef _KERNEL
82 int s;
83
84 /*
85 * Do not profile execution if memory for the current CPU
86 * descriptor and profiling buffers has not yet been allocated
87 * or if the CPU we are running on has not yet set its trap
88 * handler.
89 */
90 if (gmoninit == 0)
91 return;
92 #ifdef SUSPEND
93 if (mcount_disabled)
94 return;
95 #endif
96 if ((p = curcpu()->ci_gmon) == NULL)
97 return;
98 #else
99 p = &_gmonparam;
100 #endif
101 /*
102 * check that we are profiling
103 * and that we aren't recursively invoked.
104 */
105 if (p->state != GMON_PROF_ON)
106 return;
107 #ifdef _KERNEL
108 MCOUNT_ENTER;
This patch adds a second check to _mcount(), mcount_disabled, which
has a distinct meaning.
gmoninit means:
The boot initialization is now done and it is safe to touch the
per-CPU GPROF data structures.
mcount_disabled says:
Keep out.
There may be a clever way to merge the two variables, but the simplest
thing I could think of was to just add a second boolean. The current
patch is idiot-proof.
> Unless someone wants to gprof suspend/hibernate but then doing this will
> result in bad data.
That's way out of scope, I'm not advocating for that. If you want to
profile sleep_state() you need to do it some other way.
> Another option is to abort zzz/ZZZ if kernel profiling is running.
I don't think this would be a good user experience. Performing the
suspend without question and possibly returning bad profiling results
is better than failing the suspend.
IMHO, if suspend/resume interferes with kgmon(8) yielding accurate
results we just need to document it in kgmon.8.
I would argue that suspend/resume is an edge case one needs to keep in
mind. You can still profile large parts of the kernel's execution
path.
> In your diff I would remove these asserts:
> KASSERT(CPU_IS_PRIMARY(curcpu()));
> The code does not require the primary cpu there and KASSERT are not for free.
Sure, dropped.
Index: sys/lib/libkern/mcount.c
===================================================================
RCS file: /cvs/src/sys/lib/libkern/mcount.c,v
retrieving revision 1.14
diff -u -p -r1.14 mcount.c
--- sys/lib/libkern/mcount.c 11 Jan 2022 09:21:34 -0000 1.14
+++ sys/lib/libkern/mcount.c 9 Jul 2023 22:03:22 -0000
@@ -33,6 +33,28 @@
#include <sys/param.h>
#include <sys/gmon.h>
+#ifdef _KERNEL
+#ifdef SUSPEND
+#include <sys/atomic.h>
+
+volatile int mcount_disabled;
+
+void
+mcount_disable(void)
+{
+ mcount_disabled = 1;
+ membar_producer();
+}
+
+void
+mcount_enable(void)
+{
+ mcount_disabled = 0;
+ membar_producer();
+}
+#endif /* SUSPEND */
+#endif /* _KERNEL */
+
/*
* mcount is called on entry to each function compiled with the profiling
* switch set. _mcount(), which is declared in a machine-dependent way
@@ -63,7 +85,10 @@ _MCOUNT_DECL(u_long frompc, u_long selfp
*/
if (gmoninit == 0)
return;
-
+#ifdef SUSPEND
+ if (mcount_disabled)
+ return;
+#endif
if ((p = curcpu()->ci_gmon) == NULL)
return;
#else
Index: sys/kern/subr_suspend.c
===================================================================
RCS file: /cvs/src/sys/kern/subr_suspend.c,v
retrieving revision 1.15
diff -u -p -r1.15 subr_suspend.c
--- sys/kern/subr_suspend.c 2 Jul 2023 19:02:27 -0000 1.15
+++ sys/kern/subr_suspend.c 9 Jul 2023 22:03:22 -0000
@@ -26,6 +26,9 @@
#include <sys/mount.h>
#include <sys/syscallargs.h>
#include <dev/wscons/wsdisplayvar.h>
+#ifdef GPROF
+#include <sys/gmon.h>
+#endif
#ifdef HIBERNATE
#include <sys/hibernate.h>
#endif
@@ -63,6 +66,9 @@ top:
if (sleep_showstate(v, sleepmode))
return EOPNOTSUPP;
+#ifdef GPROF
+ mcount_disable();
+#endif
#if NWSDISPLAY > 0
wsdisplay_suspend();
#endif
@@ -192,6 +198,9 @@ fail_hiballoc:
start_periodic_resettodr();
#if NWSDISPLAY > 0
wsdisplay_resume();
+#endif
+#ifdef GPROF
+ mcount_enable();
#endif
sys_sync(curproc, NULL, NULL);
if (cpu_setperf != NULL)
Index: sys/sys/gmon.h
===================================================================
RCS file: /cvs/src/sys/sys/gmon.h,v
retrieving revision 1.9
diff -u -p -r1.9 gmon.h
--- sys/sys/gmon.h 11 Jan 2022 23:59:55 -0000 1.9
+++ sys/sys/gmon.h 9 Jul 2023 22:03:22 -0000
@@ -158,6 +158,10 @@ struct gmonparam {
#ifdef _KERNEL
extern int gmoninit; /* Is the kernel ready for being profiled? */
+#ifdef SUSPEND
+void mcount_disable(void);
+void mcount_enable(void);
+#endif
#else /* !_KERNEL */
#include <sys/cdefs.h>
Index: lib/libc/gmon/mcount.c
===================================================================
RCS file: /cvs/src/lib/libc/gmon/mcount.c,v
retrieving revision 1.16
diff -u -p -r1.16 mcount.c
--- lib/libc/gmon/mcount.c 11 Jan 2022 09:21:34 -0000 1.16
+++ lib/libc/gmon/mcount.c 9 Jul 2023 22:03:23 -0000
@@ -31,6 +31,28 @@
#include <sys/types.h>
#include <sys/gmon.h>
+#ifdef _KERNEL
+#ifdef SUSPEND
+#include <sys/atomic.h>
+
+volatile int mcount_disabled;
+
+void
+mcount_disable(void)
+{
+ mcount_disabled = 1;
+ membar_producer();
+}
+
+void
+mcount_enable(void)
+{
+ mcount_disabled = 0;
+ membar_producer();
+}
+#endif /* SUSPEND */
+#endif /* _KERNEL */
+
/*
* mcount is called on entry to each function compiled with the profiling
* switch set. _mcount(), which is declared in a machine-dependent way
@@ -61,7 +83,10 @@ _MCOUNT_DECL(u_long frompc, u_long selfp
*/
if (gmoninit == 0)
return;
-
+#ifdef SUSPEND
+ if (mcount_disabled)
+ return;
+#endif
if ((p = curcpu()->ci_gmon) == NULL)
return;
#else