On Thu, Aug 24, 2023 at 07:21:29PM +0200, Martin Pieuchot wrote:
> On 23/08/23(Wed) 18:52, Scott Cheloha wrote:
> > This is the next patch in the clock interrupt reorganization series.
>
> Thanks for your diff. I'm sorry but it is really hard for me to help
> review this diff because there is still no man page for this API+subsystem.
>
> Can we start with that please?
Sure, a first draft of a clockintr_establish.9 manpage is included
below.
We also have a manpage in the tree, clockintr.9. It is a bit out of
date, but it covers the broad strokes of how the driver-facing portion
of the subsystem works.
> > This patch moves the entry points for the interval and profile dt(4)
> > providers from the hardclock(9) to a dedicated clock interrupt
> > callback, dt_prov_profile_intr(), in dev/dt/dt_prov_profile.c.
> >
> > - To preserve current behavior, (1) both provider entrypoints have
> > been moved into a single callback function, (2) the interrupt runs at
> > the same frequency as the hardclock, and (3) the interrupt is
> > staggered to co-occur with the hardclock on a given CPU.
>
> The only behavior that needs to be preserved is the output of dumping
> stacks. That means DT_FA_PROFILE and DT_FA_STATIC certainly needs to
> be adapted with this change. You can figure that out by looking at the
> output of /usr/src/share/btrace/kprofile.bt without and with this diff.
>
> Please generate a FlameGraph to make sure they're still the same.
dt_prov_profile_intr() runs at the same stack depth as hardclock(), so
indeed they are still the same.
> Apart from that I'd prefer if we could skip the mechanical change and
> go straight to what dt(4) needs. Otherwise we will have to re-design
> everything.
I think a mechanical "move the code from point A to point B" patch is
useful. It makes the changes easier to follow when tracing the
revision history in the future.
If you insist on skipping it, though, I think I can live without it.
> If you don't want to do this work, then leave it and tell
> me what you need and what is your plan so I can help you and do it
> myself.
I am very keen to do this, or at least to help with it.
> dt(4) needs a way to schedule two different kind of periodic timeouts
> with the higher precision possible. It is currently plugged to hardclock
> because there is nothing better.
Yes.
> The current code assumes the periodic entry points are external to dt(4).
> This diff moves them in the middle of dt(4) but keeps the existing flow
> which makes the code very convoluted.
>
> A starting point to understand the added complexity it so see that the
> DT_ENTER() macro are no longer needed if we move the entry points inside
> dt(4).
I did see that. It seems really easy to remove the macros in a
subsequent patch, though.
Again, if you want to do it all in one patch that's OK.
> The first periodic timeout is dt_prov_interval_enter(). It could be
> implemented as a per-PCB timeout_add_nsec(9). The drawback of this
> approach is that it uses too much code in the kernel which is a problem
> when instrumenting the kernel itself. Every subsystem used by dt(4) is
> impossible to instrument with btrace(8).
I think you can avoid this instrumentation problem by using clockintr,
where the callback functions are run from the hardware interrupt
context, just like hardclock().
> The second periodic timeout it dt_prov_profile_enter(). It is similar
> to the previous one and has to run on every CPU.
>
> Both are currently bound to tick, but we want per-PCB time resolution.
> We can get rid of `dp_nticks' and `dp_maxtick' if we control when the
> timeouts fires.
My current thought is that each PCB could have its own "dp_period" or
"dp_interval", a count of nanoseconds.
> > - Each dt_pcb gets a provider-specific "dp_clockintr" pointer. If the
> > PCB's implementing provider needs a clock interrupt to do its work, it
> > stores the handle in dp_clockintr. The handle, if any, is established
> > during dtpv_alloc() and disestablished during dtpv_dealloc().
>
> Sorry, but as I said I don't understand what is a clockintr. How does it
> fit in the kernel and how is it supposed to be used?
The goal of clockintr is to provide a machine-independent API for
scheduling clock interrupts. You can use it to implement something
like hardclock() or statclock(). We are already using it to implement
these functions, among others.
> Why have it per PCB and not per provider or for the whole driver instead?
> Per-PCB implies that if I run 3 different profiling on a 32 CPU machines
> I now have 96 different clockintr. Is it what we want?
Yes, I think that sounds fine. If we run into scaling problems we can
always just change the underlying data structure to an RB tree or a
minheap.
> If so, why not simply use timeout(9)? What's the difference?
Some code needs to run from a hardware interrupt context. It's going
to be easier to profile more of the kernel if we're collecting stack
traces during a clock interrupt.
Timeouts run at IPL_SOFTCLOCK. You can profile process context
code... that's it. Timeouts also only run on a single CPU. Per-CPU
timeout wheels are at least a year away if not longer.
> > One alternative is to start running the clock interrupts when they
> > are allocated in dtpv_alloc() and stop them when they are freed in
> > dtpv_dealloc(). This is wasteful, though: the PCBs are not recording
> > yet, so the interrupts won't perform any useful work until the
> > controlling process enables recording.
> >
> > An additional pair of provider hooks, e.g. "dtpv_record_start" and
> > "dtpv_record_stop", might resolve this.
>
> Another alternative would be to have a single hardclock-like handler for
> dt(4). Sorry, I don't understand the big picture behind clockintr, so I
> can't help.
>
> > - We haven't needed to destroy clock interrupts yet, so the
> > clockintr_disestablish() function used in this patch is new.
>
> So maybe that's fishy. Are they supposed to be destroyed? [...]
Let's start with what we have above and circle back to this.
Index: share/man/man9/Makefile
===================================================================
RCS file: /cvs/src/share/man/man9/Makefile,v
retrieving revision 1.308
diff -u -p -r1.308 Makefile
--- share/man/man9/Makefile 5 Nov 2022 19:29:45 -0000 1.308
+++ share/man/man9/Makefile 26 Aug 2023 01:44:37 -0000
@@ -9,7 +9,7 @@ MAN= aml_evalnode.9 atomic_add_int.9 ato
audio.9 autoconf.9 \
bemtoh32.9 bio_register.9 bintimeadd.9 boot.9 bpf_mtap.9 buffercache.9 \
bufq_init.9 bus_dma.9 bus_space.9 \
- clockintr.9 \
+ clockintr.9 clockintr_establish.9 \
copy.9 cond_init.9 config_attach.9 config_defer.9 counters_alloc.9 \
cpumem_get.9 crypto.9 \
delay.9 disk.9 disklabel.9 dma_alloc.9 dohooks.9 \
Index: share/man/man9/clockintr_establish.9
===================================================================
RCS file: share/man/man9/clockintr_establish.9
diff -N share/man/man9/clockintr_establish.9
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ share/man/man9/clockintr_establish.9 26 Aug 2023 01:44:37 -0000
@@ -0,0 +1,239 @@
+.\" $OpenBSD$
+.\"
+.\" Copyright (c) 2020-2023 Scott Cheloha <[email protected]>
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate$
+.Dt CLOCKINTR_ESTABLISH 9
+.Os
+.Sh NAME
+.Nm clockintr_advance ,
+.Nm clockintr_cancel ,
+.Nm clockintr_disestablish ,
+.Nm clockintr_establish ,
+.Nm clockintr_expiration ,
+.Nm clockintr_nsecuptime ,
+.Nm clockintr_schedule ,
+.Nm clockintr_stagger
+.Nd schedule a function for execution from clock interrupt context
+.Sh SYNOPSIS
+.In sys/clockintr.h
+.Ft uint64_t
+.Fo clockintr_advance
+.Fa "struct clockintr *cl"
+.Fa "uint64_t interval"
+.Fc
+.Ft void
+.Fo clockintr_cancel
+.Fa "struct clockintr *cl"
+.Fc
+.Ft void
+.Fo clockintr_disestablish
+.Fa "struct clockintr *cl"
+.Fc
+.Ft struct clockintr *
+.Fo clockintr_establish
+.Fa "struct clockintr_queue *queue"
+.Fa "void (*func)(struct clockintr *, void *)"
+.Fc
+.Ft uint64_t
+.Fo clockintr_expiration
+.Fa "const struct clockintr *cl"
+.Fc
+.Ft uint64_t
+.Fo clockintr_nsecuptime
+.Fa "const struct clockintr *cl"
+.Fc
+.Ft void
+.Fo clockintr_schedule
+.Fa "struct clockintr *cl"
+.Fa "uint64_t abs"
+.Fc
+.Ft void
+.Fo clockintr_stagger
+.Fa "struct clockintr *cl"
+.Fa "uint64_t interval"
+.Fa "u_int numerator"
+.Fa "u_int denominator"
+.Fc
+.Sh DESCRIPTION
+The clock interrupt subsystem schedules functions for asynchronous execution
+in the future from a hardware interrupt context.
+.Pp
+The
+.Fn clockintr_establish
+function allocates a new clock interrupt object
+.Po
+a
+.Dq clockintr
+.Pc
+and binds it to the given clock interrupt
+.Fa queue .
+When the clockintr is executed,
+the callback function
+.Fa func
+will be called from a hardware interrupt context on the CPU in control of the
+.Fa queue .
+The new clockintr is not initially scheduled for execution and has an
+expiration time of zero.
+.Pp
+The
+.Fn clockintr_schedule
+function schedules the given clockintr for execution at or after the
+absolute time
+.Fa abs
+has elapsed on the system uptime clock.
+If the clockintr is already pending,
+its original expiration time is quietly superseded by
+.Fa abs .
+.Pp
+The
+.Fn clockintr_advance
+function reschedules the given clockintr to expire in the future at an offset
+from the current expiration time equal to an integral multiple of the given
+.Fa interval ,
+a non-zero count of nanoseconds.
+This interface is useful for rescheduling periodic clock interrupts.
+.Pp
+The
+.Fn clockintr_cancel
+function cancels any pending execution of the given clockintr.
+Its internal expiration time is unmodified.
+.Pp
+The
+.Fn clockintr_disestablish
+function cancels any pending execution of the given clockintr and frees
+its underlying object.
+It is an error to use a clockintr after it is disestablished.
+It is an error for a callback function to disestablish its own clockintr.
+.Pp
+The
+.Fn clockintr_expiration
+function returns the given clockintr's expiration time.
+.Pp
+The
+.Fn clockintr_nsecuptime
+function returns the system uptime value cached by the clock interrupt
+subsystem.
+The result is reasonably accurate and is produced much more quickly than
+.Xr nsecuptime 9 .
+It is an error to call this function outside of a clockintr callback function.
+.Pp
+The
+.Fn clockintr_stagger
+function resets the given clockintr's expiration time to a fraction of the
given
+.Fa interval ,
+a count of nanoseconds.
+The clockintr is not scheduled for execution:
+only its internal expiration time is modified.
+This interface may be used in combination with
+.Fn clockintr_advance
+to minimize the likelihood that a periodic clock interrupt will execute
+simultaneously on more than one CPU.
+It is an error if the
+.Fa numerator
+is greater than or equal to the
+.Fa denominator .
+It is an error to stagger a clockintr that is already scheduled for execution.
+.Sh CONTEXT
+The
+.Fn clockintr_advance ,
+.Fn clockintr_cancel ,
+.Fn clockintr_expiration ,
+.Fn clockintr_schedule ,
+and
+.Fn clockintr_stagger
+functions may be called during autoconf,
+from process context,
+or from interrupt context.
+.Pp
+The
+.Fn clockintr_disestablish
+and
+.Fn clockintr_establish
+functions may be called during autoconf or from process context.
+.Pp
+The
+.Fn clockintr_nsecuptime
+function may only be called during a clockintr callback function.
+.Pp
+When a clockintr is executed,
+the callback function
+.Fa func
+set by
+.Fn clockintr_establish
+is called from the
+.Dv IPL_CLOCK
+interrupt context on the CPU in control of the clockintr's parent
+.Fa queue .
+The function
+.Fa func
+must not block.
+The first argument to
+.Fa func
+is a pointer to a private copy of the underlying clock interrupt object.
+The callback function may use this pointer to reschedule the clockintr.
+The second argument to
+.Fa func
+is a pointer to the current CPU's machine-dependent clockframe object.
+.Sh RETURN VALUES
+The
+.Fn clockintr_advance
+function returns the number of intervals that have elapsed since the
clockintr's
+expiration time,
+or zero if the clockintr's expiration time has not yet elapsed.
+.Pp
+The
+.Fn clockintr_establish
+function returns a pointer to a new clock interrupt handle on success,
+or
+.Dv NULL
+if the allocation fails.
+.Pp
+The
+.Fn clockintr_expiration
+and
+.Fn clockintr_nsecuptime
+functions' return values are as described above.
+.Sh CODE REFERENCES
+.Pa sys/kern/kern_clockintr.c
+.Sh SEE ALSO
+.Xr hardclock 9 ,
+.Xr hz 9 ,
+.Xr inittodr 9 ,
+.Xr nanouptime 9 ,
+.Xr spl 9 ,
+.Xr tc_init 9 ,
+.Xr timeout 9
+.Rs
+.%A Steven McCanne
+.%A Chris Torek
+.%T A Randomized Sampling Clock for CPU Utilization Estimation and Code
Profiling
+.%B \&In Proc. Winter 1993 USENIX Conference
+.%D 1993
+.%P pp. 387\(en394
+.%I USENIX Association
+.Re
+.Rs
+.%A Richard McDougall
+.%A Jim Mauro
+.%B Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture
+.%I Prentice Hall
+.%I Sun Microsystems Press
+.%D 2nd Edition, 2007
+.%P pp. 912\(en925
+.Re
+.Sh HISTORY
+The clock interrupt subsystem first appeared in
+.Ox 7.3 .
Index: sys/kern/kern_clockintr.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clockintr.c,v
retrieving revision 1.32
diff -u -p -r1.32 kern_clockintr.c
--- sys/kern/kern_clockintr.c 21 Aug 2023 17:22:04 -0000 1.32
+++ sys/kern/kern_clockintr.c 26 Aug 2023 01:44:38 -0000
@@ -218,7 +218,7 @@ clockintr_dispatch(void *frame)
{
uint64_t lateness, run = 0, start;
struct cpu_info *ci = curcpu();
- struct clockintr *cl;
+ struct clockintr *cl, *shadow;
struct clockintr_queue *cq = &ci->ci_queue;
u_int ogen;
@@ -257,22 +257,26 @@ clockintr_dispatch(void *frame)
break;
}
clockintr_cancel_locked(cl);
- cq->cq_shadow.cl_expiration = cl->cl_expiration;
+ shadow = &cq->cq_shadow;
+ shadow->cl_expiration = cl->cl_expiration;
+ shadow->cl_func = cl->cl_func;
cq->cq_running = cl;
mtx_leave(&cq->cq_mtx);
- cl->cl_func(&cq->cq_shadow, frame);
+ shadow->cl_func(shadow, frame);
mtx_enter(&cq->cq_mtx);
- cq->cq_running = NULL;
- if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) {
- CLR(cl->cl_flags, CLST_IGNORE_SHADOW);
- CLR(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING);
- }
- if (ISSET(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING)) {
- CLR(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING);
- clockintr_schedule_locked(cl,
- cq->cq_shadow.cl_expiration);
+ if (cq->cq_running != NULL) {
+ cq->cq_running = NULL;
+ if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) {
+ CLR(cl->cl_flags, CLST_IGNORE_SHADOW);
+ CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
+ }
+ if (ISSET(shadow->cl_flags, CLST_SHADOW_PENDING)) {
+ CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
+ clockintr_schedule_locked(cl,
+ shadow->cl_expiration);
+ }
}
run++;
}
@@ -382,6 +386,34 @@ clockintr_cancel_locked(struct clockintr
TAILQ_REMOVE(&cq->cq_pend, cl, cl_plink);
CLR(cl->cl_flags, CLST_PENDING);
+}
+
+void
+clockintr_disestablish(struct clockintr *cl)
+{
+ struct clockintr_queue *cq = cl->cl_queue;
+ int was_next;
+
+ if (cl == &cq->cq_shadow)
+ panic("%s: called during dispatch\n", __func__);
+
+ mtx_enter(&cq->cq_mtx);
+ if (ISSET(cl->cl_flags, CLST_PENDING)) {
+ was_next = cl == TAILQ_FIRST(&cq->cq_pend);
+ clockintr_cancel_locked(cl);
+ if (ISSET(cq->cq_flags, CQ_INTRCLOCK)) {
+ if (was_next && !TAILQ_EMPTY(&cq->cq_pend)) {
+ if (cq == &curcpu()->ci_queue)
+ clockqueue_reset_intrclock(cq);
+ }
+ }
+ }
+ if (cl == cq->cq_running)
+ cq->cq_running = NULL;
+ TAILQ_REMOVE(&cq->cq_est, cl, cl_elink);
+ mtx_leave(&cq->cq_mtx);
+
+ free(cl, M_DEVBUF, sizeof *cl);
}
struct clockintr *
Index: sys/kern/kern_clock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.115
diff -u -p -r1.115 kern_clock.c
--- sys/kern/kern_clock.c 23 Aug 2023 01:55:45 -0000 1.115
+++ sys/kern/kern_clock.c 26 Aug 2023 01:44:38 -0000
@@ -49,11 +49,6 @@
#include <sys/sched.h>
#include <sys/timetc.h>
-#include "dt.h"
-#if NDT > 0
-#include <dev/dt/dtvar.h>
-#endif
-
/*
* Clock handling routines.
*
@@ -114,12 +109,6 @@ initclocks(void)
void
hardclock(struct clockframe *frame)
{
-#if NDT > 0
- DT_ENTER(profile, NULL);
- if (CPU_IS_PRIMARY(curcpu()))
- DT_ENTER(interval, NULL);
-#endif
-
/*
* If we are not the primary CPU, we're not allowed to do
* any more work.
Index: sys/sys/clockintr.h
===================================================================
RCS file: /cvs/src/sys/sys/clockintr.h,v
retrieving revision 1.10
diff -u -p -r1.10 clockintr.h
--- sys/sys/clockintr.h 21 Aug 2023 17:22:04 -0000 1.10
+++ sys/sys/clockintr.h 26 Aug 2023 01:44:38 -0000
@@ -128,6 +128,7 @@ void clockintr_trigger(void);
uint64_t clockintr_advance(struct clockintr *, uint64_t);
void clockintr_cancel(struct clockintr *);
+void clockintr_disestablish(struct clockintr *);
struct clockintr *clockintr_establish(struct clockintr_queue *,
void (*)(struct clockintr *, void *));
void clockintr_stagger(struct clockintr *, uint64_t, u_int, u_int);
Index: sys/dev/dt/dt_prov_profile.c
===================================================================
RCS file: /cvs/src/sys/dev/dt/dt_prov_profile.c,v
retrieving revision 1.5
diff -u -p -r1.5 dt_prov_profile.c
--- sys/dev/dt/dt_prov_profile.c 26 Apr 2023 16:53:59 -0000 1.5
+++ sys/dev/dt/dt_prov_profile.c 26 Aug 2023 01:44:38 -0000
@@ -20,6 +20,7 @@
#include <sys/systm.h>
#include <sys/param.h>
#include <sys/atomic.h>
+#include <sys/clockintr.h>
#include <dev/dt/dtvar.h>
@@ -31,6 +32,8 @@ struct dt_probe *dtpp_interval; /* glob
int dt_prov_profile_alloc(struct dt_probe *, struct dt_softc *,
struct dt_pcb_list *, struct dtioc_req *);
+int dt_prov_profile_dealloc(struct dt_probe *, struct dt_softc *,
+ struct dtioc_req *);
int dt_prov_profile_enter(struct dt_provider *, ...);
int dt_prov_interval_enter(struct dt_provider *, ...);
@@ -39,7 +42,7 @@ struct dt_provider dt_prov_profile = {
.dtpv_alloc = dt_prov_profile_alloc,
.dtpv_enter = dt_prov_profile_enter,
.dtpv_leave = NULL,
- .dtpv_dealloc = NULL,
+ .dtpv_dealloc = dt_prov_profile_dealloc,
};
struct dt_provider dt_prov_interval = {
@@ -47,9 +50,11 @@ struct dt_provider dt_prov_interval = {
.dtpv_alloc = dt_prov_profile_alloc,
.dtpv_enter = dt_prov_interval_enter,
.dtpv_leave = NULL,
- .dtpv_dealloc = NULL,
+ .dtpv_dealloc = dt_prov_profile_dealloc,
};
+void dt_prov_profile_intr(struct clockintr *, void *);
+
int
dt_prov_profile_init(void)
{
@@ -85,10 +90,17 @@ dt_prov_profile_alloc(struct dt_probe *d
continue;
dp = dt_pcb_alloc(dtp, sc);
- if (dp == NULL) {
- dt_pcb_purge(plist);
- return ENOMEM;
+ if (dp == NULL)
+ goto purge;
+
+ dp->dp_clockintr = clockintr_establish(&ci->ci_queue,
+ dt_prov_profile_intr);
+ if (dp->dp_clockintr == NULL) {
+ dt_pcb_free(dp);
+ goto purge;
}
+ clockintr_stagger(dp->dp_clockintr, hardclock_period,
+ CPU_INFO_UNIT(ci), MAXCPUS);
dp->dp_maxtick = hz / dtrq->dtrq_rate;
dp->dp_cpuid = ci->ci_cpuid;
@@ -99,6 +111,31 @@ dt_prov_profile_alloc(struct dt_probe *d
}
return 0;
+purge:
+ TAILQ_FOREACH(dp, plist, dp_snext)
+ clockintr_disestablish(dp->dp_clockintr);
+ dt_pcb_purge(plist);
+ return ENOMEM;
+}
+
+int
+dt_prov_profile_dealloc(struct dt_probe *dtp, struct dt_softc *sc,
+ struct dtioc_req *dtrq)
+{
+ struct dt_pcb *dp;
+
+ KASSERT(dtioc_req_isvalid(dtrq));
+ KASSERT(dtp == dtpp_profile || dtp == dtpp_interval);
+
+ TAILQ_FOREACH(dp, &sc->ds_pcbs, dp_snext) {
+ if (dp->dp_dtp == dtp) {
+ if (dp->dp_clockintr != NULL) {
+ clockintr_disestablish(dp->dp_clockintr);
+ dp->dp_clockintr = NULL;
+ }
+ }
+ }
+ return 0;
}
static inline void
@@ -148,4 +185,18 @@ dt_prov_interval_enter(struct dt_provide
}
smr_read_leave();
return 0;
+}
+
+void
+dt_prov_profile_intr(struct clockintr *cl, void *cf)
+{
+ uint64_t count, i;
+ struct cpu_info *ci = curcpu();
+
+ count = clockintr_advance(cl, hardclock_period);
+ for (i = 0; i < count; i++) {
+ DT_ENTER(profile, NULL);
+ if (CPU_IS_PRIMARY(ci))
+ DT_ENTER(interval, NULL);
+ }
}
Index: sys/dev/dt/dt_dev.c
===================================================================
RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
retrieving revision 1.28
diff -u -p -r1.28 dt_dev.c
--- sys/dev/dt/dt_dev.c 14 Jul 2023 07:07:08 -0000 1.28
+++ sys/dev/dt/dt_dev.c 26 Aug 2023 01:44:39 -0000
@@ -19,6 +19,7 @@
#include <sys/types.h>
#include <sys/systm.h>
#include <sys/param.h>
+#include <sys/clockintr.h>
#include <sys/device.h>
#include <sys/exec_elf.h>
#include <sys/malloc.h>
@@ -82,32 +83,6 @@
#define DPRINTF(x...) /* nothing */
-/*
- * Descriptor associated with each program opening /dev/dt. It is used
- * to keep track of enabled PCBs.
- *
- * Locks used to protect struct members in this file:
- * m per-softc mutex
- * K kernel lock
- */
-struct dt_softc {
- SLIST_ENTRY(dt_softc) ds_next; /* [K] descriptor list */
- int ds_unit; /* [I] D_CLONE unique unit */
- pid_t ds_pid; /* [I] PID of tracing program */
-
- struct mutex ds_mtx;
-
- struct dt_pcb_list ds_pcbs; /* [K] list of enabled PCBs */
- struct dt_evt *ds_bufqueue; /* [K] copy evts to userland */
- size_t ds_bufqlen; /* [K] length of the queue */
- int ds_recording; /* [K] currently recording? */
- int ds_evtcnt; /* [m] # of readable evts */
-
- /* Counters */
- uint64_t ds_readevt; /* [m] # of events read */
- uint64_t ds_dropevt; /* [m] # of events dropped */
-};
-
SLIST_HEAD(, dt_softc) dtdev_list; /* [K] list of open /dev/dt nodes */
/*
@@ -486,6 +461,14 @@ dt_ioctl_record_start(struct dt_softc *s
SMR_SLIST_INSERT_HEAD_LOCKED(&dtp->dtp_pcbs, dp, dp_pnext);
dtp->dtp_recording++;
dtp->dtp_prov->dtpv_recording++;
+
+ /*
+ * XXX Not provider-agnostic. Is there a better place
+ * to do this?
+ */
+ if (dp->dp_clockintr != NULL)
+ clockintr_advance(dp->dp_clockintr, hardclock_period);
+
}
rw_exit_write(&dt_lock);
@@ -514,6 +497,11 @@ dt_ioctl_record_stop(struct dt_softc *sc
dtp->dtp_recording--;
dtp->dtp_prov->dtpv_recording--;
+
+ /* XXX This is not provider-agnostic. */
+ if (dp->dp_clockintr != NULL)
+ clockintr_cancel(dp->dp_clockintr);
+
SMR_SLIST_REMOVE_LOCKED(&dtp->dtp_pcbs, dp, dt_pcb, dp_pnext);
}
rw_exit_write(&dt_lock);
Index: sys/dev/dt/dtvar.h
===================================================================
RCS file: /cvs/src/sys/dev/dt/dtvar.h,v
retrieving revision 1.17
diff -u -p -r1.17 dtvar.h
--- sys/dev/dt/dtvar.h 26 Apr 2023 16:53:59 -0000 1.17
+++ sys/dev/dt/dtvar.h 26 Aug 2023 01:44:41 -0000
@@ -158,12 +158,39 @@ struct dtioc_getaux {
#include <sys/queue.h>
#include <sys/smr.h>
+struct dt_pcb;
+TAILQ_HEAD(dt_pcb_list, dt_pcb);
+
/* Flags that make sense for all providers. */
#define DTEVT_COMMON (DTEVT_EXECNAME|DTEVT_KSTACK|DTEVT_USTACK)
#define M_DT M_DEVBUF /* XXX FIXME */
-struct dt_softc;
+/*
+ * Descriptor associated with each program opening /dev/dt. It is used
+ * to keep track of enabled PCBs.
+ *
+ * Locks used to protect struct members in this file:
+ * m per-softc mutex
+ * K kernel lock
+ */
+struct dt_softc {
+ SLIST_ENTRY(dt_softc) ds_next; /* [K] descriptor list */
+ int ds_unit; /* [I] D_CLONE unique unit */
+ pid_t ds_pid; /* [I] PID of tracing program */
+
+ struct mutex ds_mtx;
+
+ struct dt_pcb_list ds_pcbs; /* [K] list of enabled PCBs */
+ struct dt_evt *ds_bufqueue; /* [K] copy evts to userland */
+ size_t ds_bufqlen; /* [K] length of the queue */
+ int ds_recording; /* [K] currently recording? */
+ int ds_evtcnt; /* [m] # of readable evts */
+
+ /* Counters */
+ uint64_t ds_readevt; /* [m] # of events read */
+ uint64_t ds_dropevt; /* [m] # of events dropped */
+};
int dtioc_req_isvalid(struct dtioc_req *);
@@ -197,6 +224,7 @@ struct dt_pcb {
struct dt_filter dp_filter; /* [I] filter to match */
/* Provider specific fields. */
+ struct clockintr *dp_clockintr; /* [?] clock interrupt handle */
unsigned int dp_cpuid; /* [I] on which CPU */
unsigned int dp_maxtick; /* [I] freq. of profiling */
unsigned int dp_nticks; /* [c] current tick count */
@@ -204,8 +232,6 @@ struct dt_pcb {
/* Counters */
uint64_t dp_dropevt; /* [m] # dropped event */
};
-
-TAILQ_HEAD(dt_pcb_list, dt_pcb);
struct dt_pcb *dt_pcb_alloc(struct dt_probe *, struct dt_softc *);
void dt_pcb_free(struct dt_pcb *);