On 17/09/23(Sun) 11:22, Scott Cheloha wrote:
> v2 is attached.

Thanks.

> Clockintrs now have an argument.  If we pass the PCB as argument, we
> can avoid doing a linear search to find the PCB during the interrupt.
> 
> One thing I'm unsure about is whether I need to add a "barrier" flag
> to clockintr_cancel() and/or clockintr_disestablish().  This would
> cause the caller to block until the clockintr callback function has
> finished executing, which would ensure that it was safe to free the
> PCB.

Please do not reinvent the wheel.  Try to imitate what other kernel APIs
do.  Look at task_add(9) and timeout_add(9).  Call the functions add/del()
to match existing APIs, then we can add a clockintr_del_barrier() if needed.
Do not introduce functions before we need them.  I hope we won't need
it.

> On Mon, Sep 04, 2023 at 01:39:25PM +0100, Martin Pieuchot wrote:
> > On 25/08/23(Fri) 21:00, Scott Cheloha wrote:
> > > On Thu, Aug 24, 2023 at 07:21:29PM +0200, Martin Pieuchot wrote:
> > [...]
> > > The goal of clockintr is to provide a machine-independent API for
> > > scheduling clock interrupts.  You can use it to implement something
> > > like hardclock() or statclock().  We are already using it to implement
> > > these functions, among others.
> > 
> > After reading all the code and the previous manuals, I understand it as
> > a low-level per-CPU timeout API with nanosecond precision.  Is that it?
> 
> Yes.
> 
> The distinguishing feature is that it is usually wired up to a
> platform backend, so it can deliver the interrupt at the requested
> expiration time with relatively low error.

Why shouldn't we use it and prefer timeout then?  That's unclear to me
and I'd love to have this clearly documented.

What happened to the manual? 
 
> > Apart from running periodically on a given CPU an important need for
> > dt(4) is to get a timestamps for every event.  Currently nanotime(9)
> > is used.  This is global to the system.  I saw that ftrace use different
> > clocks per-CPU which might be something to consider now that we're
> > moving to a per-CPU API.
> > 
> > It's all about cost of the instrumentation.  Note that if we use a
> > different per-CPU timestamp it has to work outside of clockintr because
> > probes can be anywhere.
> > 
> > Regarding clockintr_establish(), I'd rather have it *not* allocated the
> > `clockintr'.  I'd prefer waste some more space in every PCB.  The reason
> > for this is to keep live allocation in dt(4) to dt_pcb_alloc() only to
> > be able to not go through malloc(9) at some point in the future to not
> > interfere with the rest of the kernel.  Is there any downside to this?
> 
> You're asking me to change from callee-allocated clockintrs to
> caller-allocated clockintrs.  I don't want to do this.

Why not?  What is your plan?  You want to put many clockintrs in the
kernel?  I don't understand where you're going.  Can you explain?  From
my point of view this design decision is only added complexity for no
good reason

> I am hoping to expirment with using a per-CPU pool for clockintrs
> during the next release cycle.  I think keeping all the clockintrs on
> a single page in memory will have cache benefits when reinserting
> clockintrs into the sorted data structure.

I do not understand this wish for micro optimization.

The API has only on dynamic consumer: dt(4), I tell you I'd prefer to
allocate the structure outside of your API and you tell me you don't
want.  You want to use pool.  Really I don't understand...  Please let's
design an API for our needs and not based on best practises from outside.

Or am I completely missing something from the clockintr world conquest
plan?

If you want to play with pools, they are many other stuff you can do :o)
I just don't understand.

> > Can we have a different hook for the interval provider?
> 
> I think I have added this now?
> 
> "profile" uses dt_prov_profile_intr().
> 
> "interval" uses dt_prov_interval_intr().
> 
> Did you mean a different kind of "hook"?

That's it and please remove DT_ENTER().  There's no need for the use of
the macro inside dt(4).  I thought I already mentioned it.

> > Since we need only one clockintr and we don't care about the CPU
> > should we pick a random one?  Could that be implemented by passing
> > a NULL "struct cpu_info *" pointer to clockintr_establish()?  So
> > multiple "interval" probes would run on different CPUs...
> 
> It would be simpler to just stick to running the "interval" provider
> on the primary CPU.

Well the primary CPU is already too loaded with interrupts on OpenBSD.
So if we can avoid it for no good reason, I'd prefer.

> If you expect a large number of "interval" probes to be active at
> once, it might help to choose the CPU round-robin style.  But that
> would be an optimization for a later step.

Let's do that now!  There's no limitation with your API right?

> > I'd rather keep "struct dt_softc" private to not create too convoluted
> > code.
> 
> I can't figure out how to deallocate the PCBs from dtpv_dealloc()
> without using the softc.  And the dt_softc pointer is in the
> dtpv_dealloc() prototype, so shouldn't the underlying struct be
> visible?
> 
> Look at dt_prov_profile_dealloc().  What am I doing wrong?

The problem is that the dealloc() hook is there to disable probes.
You're now using it to free PCB resources.  The problem goes away
if we do not allocate them dynamically.   clockintr_cancel() has
already been called.

This is currently overly complex and confusing.  Please keep it simple.

I dislike to be the one saying no.  I feel pressured by unrelated API
changes that I feel you want to push and now dt(4) has to adapt for no
good reason.

I also don't understand why clockqueue_init() does not take a CPU as
argument.  There's no good reason to expose "struct clockintr_queue" to
the world.  If what we want is a per-CPU scheduling API, then let's pass
a CPU.  That's what matter.

> Index: dev/dt/dt_prov_profile.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/dt/dt_prov_profile.c,v
> retrieving revision 1.5
> diff -u -p -r1.5 dt_prov_profile.c
> --- dev/dt/dt_prov_profile.c  26 Apr 2023 16:53:59 -0000      1.5
> +++ dev/dt/dt_prov_profile.c  17 Sep 2023 16:22:26 -0000
> @@ -20,6 +20,7 @@
>  #include <sys/systm.h>
>  #include <sys/param.h>
>  #include <sys/atomic.h>
> +#include <sys/clockintr.h>
>  
>  #include <dev/dt/dtvar.h>
>  
> @@ -31,6 +32,8 @@ struct dt_probe     *dtpp_interval;         /* glob
>  
>  int  dt_prov_profile_alloc(struct dt_probe *, struct dt_softc *,
>           struct dt_pcb_list *, struct dtioc_req *);
> +int  dt_prov_profile_dealloc(struct dt_probe *, struct dt_softc *,
> +         struct dtioc_req *);
>  int  dt_prov_profile_enter(struct dt_provider *, ...);
>  int  dt_prov_interval_enter(struct dt_provider *, ...);
>  
> @@ -39,7 +42,7 @@ struct dt_provider dt_prov_profile = {
>       .dtpv_alloc     = dt_prov_profile_alloc,
>       .dtpv_enter     = dt_prov_profile_enter,
>       .dtpv_leave     = NULL,
> -     .dtpv_dealloc   = NULL,
> +     .dtpv_dealloc   = dt_prov_profile_dealloc,
>  };
>  
>  struct dt_provider dt_prov_interval = {
> @@ -47,9 +50,12 @@ struct dt_provider dt_prov_interval = {
>       .dtpv_alloc     = dt_prov_profile_alloc,
>       .dtpv_enter     = dt_prov_interval_enter,
>       .dtpv_leave     = NULL,
> -     .dtpv_dealloc   = NULL,
> +     .dtpv_dealloc   = dt_prov_profile_dealloc,
>  };
>  
> +void dt_prov_profile_intr(struct clockintr *, void *, void *);
> +void dt_prov_interval_intr(struct clockintr *, void *, void *);
> +
>  int
>  dt_prov_profile_init(void)
>  {
> @@ -69,6 +75,7 @@ dt_prov_profile_alloc(struct dt_probe *d
>      struct dt_pcb_list *plist, struct dtioc_req *dtrq)
>  {
>       struct dt_pcb *dp;
> +     void (*func)(struct clockintr *, void *, void *);
>       struct cpu_info *ci;
>       CPU_INFO_ITERATOR cii;
>       extern int hz;
> @@ -80,17 +87,26 @@ dt_prov_profile_alloc(struct dt_probe *d
>       if (dtrq->dtrq_rate <= 0 || dtrq->dtrq_rate > hz)
>               return EOPNOTSUPP;
>  
> +     if (dtp == dtpp_interval)
> +             func = dt_prov_interval_intr;
> +     else
> +             func = dt_prov_profile_intr;
> +
>       CPU_INFO_FOREACH(cii, ci) {
>               if (!CPU_IS_PRIMARY(ci) && (dtp == dtpp_interval))
>                       continue;
>  
>               dp = dt_pcb_alloc(dtp, sc);
> -             if (dp == NULL) {
> -                     dt_pcb_purge(plist);
> -                     return ENOMEM;
> +             if (dp == NULL)
> +                     goto purge;
> +
> +             dp->dp_clockintr = clockintr_establish(ci, func, dp);
> +             if (dp->dp_clockintr == NULL) {
> +                     dt_pcb_free(dp);
> +                     goto purge;
>               }
>  
> -             dp->dp_maxtick = hz / dtrq->dtrq_rate;
> +             dp->dp_nsecs = SEC_TO_NSEC(1) / dtrq->dtrq_rate;
>               dp->dp_cpuid = ci->ci_cpuid;
>  
>               dp->dp_filter = dtrq->dtrq_filter;
> @@ -99,6 +115,33 @@ dt_prov_profile_alloc(struct dt_probe *d
>       }
>  
>       return 0;
> +purge:
> +     TAILQ_FOREACH(dp, plist, dp_snext) {
> +             if (dp->dp_clockintr != NULL)
> +                     clockintr_disestablish(dp->dp_clockintr);
> +     }
> +     dt_pcb_purge(plist);
> +     return ENOMEM;
> +}
> +
> +int
> +dt_prov_profile_dealloc(struct dt_probe *dtp, struct dt_softc *sc,
> +    struct dtioc_req *dtrq)
> +{
> +     struct dt_pcb *dp;
> +
> +     KASSERT(dtioc_req_isvalid(dtrq));
> +     KASSERT(dtp == dtpp_profile || dtp == dtpp_interval);
> +
> +     TAILQ_FOREACH(dp, &sc->ds_pcbs, dp_snext) {
> +             if (dp->dp_dtp == dtp) {
> +                     if (dp->dp_clockintr != NULL) {
> +                             clockintr_disestablish(dp->dp_clockintr);
> +                             dp->dp_clockintr = NULL;
> +                     }
> +             }
> +     }
> +     return 0;
>  }
>  
>  static inline void
> @@ -106,46 +149,68 @@ dt_prov_profile_fire(struct dt_pcb *dp)
>  {
>       struct dt_evt *dtev;
>  
> -     if (++dp->dp_nticks < dp->dp_maxtick)
> -             return;
> -
>       dtev = dt_pcb_ring_get(dp, 1);
>       if (dtev == NULL)
>               return;
>       dt_pcb_ring_consume(dp, dtev);
> -     dp->dp_nticks = 0;
>  }
>  
>  int
>  dt_prov_profile_enter(struct dt_provider *dtpv, ...)
>  {
> -     struct cpu_info *ci = curcpu();
> +     uint64_t count, i;
> +     struct clockintr *cl;
>       struct dt_pcb *dp;
> +     va_list ap;
>  
>       KASSERT(dtpv == &dt_prov_profile);
> +     va_start(ap, dtpv);
> +     cl = va_arg(ap, struct clockintr *);
> +     dp = va_arg(ap, struct dt_pcb *);
> +     va_end(ap);
>  
>       smr_read_enter();
> -     SMR_SLIST_FOREACH(dp, &dtpp_profile->dtp_pcbs, dp_pnext) {
> -             if (dp->dp_cpuid != ci->ci_cpuid)
> -                     continue;
> -
> +     count = clockintr_advance(cl, dp->dp_nsecs);
> +     for (i = 0; i < count; i++)
>               dt_prov_profile_fire(dp);
> -     }
>       smr_read_leave();
>       return 0;
>  }
>  
> +void
> +dt_prov_profile_intr(struct clockintr *cl, void *cf, void *arg)
> +{
> +     struct dt_pcb *dp = arg;
> +
> +     DT_ENTER(profile, cl, dp);
> +}
> +
>  int
>  dt_prov_interval_enter(struct dt_provider *dtpv, ...)
>  {
> +     uint64_t count, i;
> +     struct clockintr *cl;
>       struct dt_pcb *dp;
> +     va_list ap;
>  
>       KASSERT(dtpv == &dt_prov_interval);
> +     va_start(ap, dtpv);
> +     cl = va_arg(ap, struct clockintr *);
> +     dp = va_arg(ap, struct dt_pcb *);
> +     va_end(ap);
>  
>       smr_read_enter();
> -     SMR_SLIST_FOREACH(dp, &dtpp_interval->dtp_pcbs, dp_pnext) {
> +     count = clockintr_advance(cl, dp->dp_nsecs);
> +     for (i = 0; i < count; i++)
>               dt_prov_profile_fire(dp);
> -     }
>       smr_read_leave();
>       return 0;
> +}
> +
> +void
> +dt_prov_interval_intr(struct clockintr *cl, void *cf, void *arg)
> +{
> +     struct dt_pcb *dp = arg;
> +
> +     DT_ENTER(interval, cl, dp);
>  }
> Index: dev/dt/dt_dev.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
> retrieving revision 1.28
> diff -u -p -r1.28 dt_dev.c
> --- dev/dt/dt_dev.c   14 Jul 2023 07:07:08 -0000      1.28
> +++ dev/dt/dt_dev.c   17 Sep 2023 16:22:26 -0000
> @@ -19,6 +19,7 @@
>  #include <sys/types.h>
>  #include <sys/systm.h>
>  #include <sys/param.h>
> +#include <sys/clockintr.h>
>  #include <sys/device.h>
>  #include <sys/exec_elf.h>
>  #include <sys/malloc.h>
> @@ -82,32 +83,6 @@
>  
>  #define DPRINTF(x...) /* nothing */
>  
> -/*
> - * Descriptor associated with each program opening /dev/dt.  It is used
> - * to keep track of enabled PCBs.
> - *
> - *  Locks used to protect struct members in this file:
> - *   m       per-softc mutex
> - *   K       kernel lock
> - */
> -struct dt_softc {
> -     SLIST_ENTRY(dt_softc)    ds_next;       /* [K] descriptor list */
> -     int                      ds_unit;       /* [I] D_CLONE unique unit */
> -     pid_t                    ds_pid;        /* [I] PID of tracing program */
> -
> -     struct mutex             ds_mtx;
> -
> -     struct dt_pcb_list       ds_pcbs;       /* [K] list of enabled PCBs */
> -     struct dt_evt           *ds_bufqueue;   /* [K] copy evts to userland */
> -     size_t                   ds_bufqlen;    /* [K] length of the queue */
> -     int                      ds_recording;  /* [K] currently recording? */
> -     int                      ds_evtcnt;     /* [m] # of readable evts */
> -
> -     /* Counters */
> -     uint64_t                 ds_readevt;    /* [m] # of events read */
> -     uint64_t                 ds_dropevt;    /* [m] # of events dropped */
> -};
> -
>  SLIST_HEAD(, dt_softc) dtdev_list;   /* [K] list of open /dev/dt nodes */
>  
>  /*
> @@ -484,8 +459,12 @@ dt_ioctl_record_start(struct dt_softc *s
>               struct dt_probe *dtp = dp->dp_dtp;
>  
>               SMR_SLIST_INSERT_HEAD_LOCKED(&dtp->dtp_pcbs, dp, dp_pnext);
> +             
>               dtp->dtp_recording++;
>               dtp->dtp_prov->dtpv_recording++;
> +
> +             if (dp->dp_clockintr != NULL)
> +                     clockintr_advance(dp->dp_clockintr, dp->dp_nsecs);
>       }
>       rw_exit_write(&dt_lock);
>  
> @@ -514,6 +493,10 @@ dt_ioctl_record_stop(struct dt_softc *sc
>  
>               dtp->dtp_recording--;
>               dtp->dtp_prov->dtpv_recording--;
> +
> +             if (dp->dp_clockintr != NULL)
> +                     clockintr_cancel(dp->dp_clockintr);
> +
>               SMR_SLIST_REMOVE_LOCKED(&dtp->dtp_pcbs, dp, dt_pcb, dp_pnext);
>       }
>       rw_exit_write(&dt_lock);
> Index: dev/dt/dtvar.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/dt/dtvar.h,v
> retrieving revision 1.17
> diff -u -p -r1.17 dtvar.h
> --- dev/dt/dtvar.h    26 Apr 2023 16:53:59 -0000      1.17
> +++ dev/dt/dtvar.h    17 Sep 2023 16:22:27 -0000
> @@ -158,12 +158,39 @@ struct dtioc_getaux {
>  #include <sys/queue.h>
>  #include <sys/smr.h>
>  
> +struct dt_pcb;
> +TAILQ_HEAD(dt_pcb_list, dt_pcb);
> +
>  /* Flags that make sense for all providers. */
>  #define DTEVT_COMMON (DTEVT_EXECNAME|DTEVT_KSTACK|DTEVT_USTACK)
>  
>  #define M_DT M_DEVBUF /* XXX FIXME */
>  
> -struct dt_softc;
> +/*
> + * Descriptor associated with each program opening /dev/dt.  It is used
> + * to keep track of enabled PCBs.
> + *
> + *  Locks used to protect struct members in this file:
> + *   m       per-softc mutex
> + *   K       kernel lock
> + */
> +struct dt_softc {
> +     SLIST_ENTRY(dt_softc)    ds_next;       /* [K] descriptor list */
> +     int                      ds_unit;       /* [I] D_CLONE unique unit */
> +     pid_t                    ds_pid;        /* [I] PID of tracing program */
> +
> +     struct mutex             ds_mtx;
> +
> +     struct dt_pcb_list       ds_pcbs;       /* [K] list of enabled PCBs */
> +     struct dt_evt           *ds_bufqueue;   /* [K] copy evts to userland */
> +     size_t                   ds_bufqlen;    /* [K] length of the queue */
> +     int                      ds_recording;  /* [K] currently recording? */
> +     int                      ds_evtcnt;     /* [m] # of readable evts */
> +
> +     /* Counters */
> +     uint64_t                 ds_readevt;    /* [m] # of events read */
> +     uint64_t                 ds_dropevt;    /* [m] # of events dropped */
> +};
>  
>  int          dtioc_req_isvalid(struct dtioc_req *);
>  
> @@ -197,15 +224,13 @@ struct dt_pcb {
>       struct dt_filter         dp_filter;     /* [I] filter to match */
>  
>       /* Provider specific fields. */
> +     struct clockintr        *dp_clockintr;  /* [?] clock interrupt handle */
> +     uint64_t                 dp_nsecs;      /* [I] clockintr period */
>       unsigned int             dp_cpuid;      /* [I] on which CPU */
> -     unsigned int             dp_maxtick;    /* [I] freq. of profiling */
> -     unsigned int             dp_nticks;     /* [c] current tick count */
>  
>       /* Counters */
>       uint64_t                 dp_dropevt;    /* [m] # dropped event */
>  };
> -
> -TAILQ_HEAD(dt_pcb_list, dt_pcb);
>  
>  struct dt_pcb        *dt_pcb_alloc(struct dt_probe *, struct dt_softc *);
>  void          dt_pcb_free(struct dt_pcb *);
> Index: kern/kern_clock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> retrieving revision 1.119
> diff -u -p -r1.119 kern_clock.c
> --- kern/kern_clock.c 14 Sep 2023 22:27:09 -0000      1.119
> +++ kern/kern_clock.c 17 Sep 2023 16:22:27 -0000
> @@ -50,11 +50,6 @@
>  #include <sys/sched.h>
>  #include <sys/timetc.h>
>  
> -#include "dt.h"
> -#if NDT > 0
> -#include <dev/dt/dtvar.h>
> -#endif
> -
>  /*
>   * Clock handling routines.
>   *
> @@ -144,12 +139,6 @@ initclocks(void)
>  void
>  hardclock(struct clockframe *frame)
>  {
> -#if NDT > 0
> -     DT_ENTER(profile, NULL);
> -     if (CPU_IS_PRIMARY(curcpu()))
> -             DT_ENTER(interval, NULL);
> -#endif
> -
>       /*
>        * If we are not the primary CPU, we're not allowed to do
>        * any more work.
> Index: kern/kern_clockintr.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_clockintr.c,v
> retrieving revision 1.56
> diff -u -p -r1.56 kern_clockintr.c
> --- kern/kern_clockintr.c     17 Sep 2023 15:24:35 -0000      1.56
> +++ kern/kern_clockintr.c     17 Sep 2023 16:22:27 -0000
> @@ -217,14 +217,17 @@ clockintr_dispatch(void *frame)
>               shadow->cl_func(shadow, frame, shadow->cl_arg);
>  
>               mtx_enter(&cq->cq_mtx);
> -             cq->cq_running = NULL;
> -             if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) {
> -                     CLR(cl->cl_flags, CLST_IGNORE_SHADOW);
> -                     CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
> -             }
> -             if (ISSET(shadow->cl_flags, CLST_SHADOW_PENDING)) {
> -                     CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
> -                     clockqueue_pend_insert(cq, cl, shadow->cl_expiration);
> +             if (cq->cq_running != NULL) {
> +                     cq->cq_running = NULL;
> +                     if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) {
> +                             CLR(cl->cl_flags, CLST_IGNORE_SHADOW);
> +                             CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
> +                     }
> +                     if (ISSET(shadow->cl_flags, CLST_SHADOW_PENDING)) {
> +                             CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
> +                             clockqueue_pend_insert(cq, cl,
> +                                 shadow->cl_expiration);
> +                     }
>               }
>               run++;
>       }
> @@ -330,6 +333,34 @@ clockintr_cancel(struct clockintr *cl)
>       if (cl == cq->cq_running)
>               SET(cl->cl_flags, CLST_IGNORE_SHADOW);
>       mtx_leave(&cq->cq_mtx);
> +}
> +
> +void
> +clockintr_disestablish(struct clockintr *cl)
> +{
> +     struct clockintr_queue *cq = cl->cl_queue;
> +     int was_next;
> +
> +     KASSERT(cl != &cq->cq_shadow);
> +
> +     mtx_enter(&cq->cq_mtx);
> +     if (ISSET(cl->cl_flags, CLST_PENDING)) {
> +             was_next = cl == TAILQ_FIRST(&cq->cq_pend);
> +             clockqueue_pend_delete(cq, cl);
> +             if (ISSET(cq->cq_flags, CQ_INTRCLOCK)) {
> +                     if (was_next && !TAILQ_EMPTY(&cq->cq_pend)) {
> +                             if (cq == &curcpu()->ci_queue)
> +                                     clockqueue_reset_intrclock(cq);
> +                     }
> +             }
> +     }
> +     if (cl == cq->cq_running)
> +             cq->cq_running = NULL;
> +     TAILQ_REMOVE(&cq->cq_all, cl, cl_alink);
> +     cl->cl_queue = NULL;
> +     mtx_leave(&cq->cq_mtx);
> +
> +     free(cl, M_DEVBUF, sizeof *cl);
>  }
>  
>  struct clockintr *
> Index: sys/clockintr.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/clockintr.h,v
> retrieving revision 1.20
> diff -u -p -r1.20 clockintr.h
> --- sys/clockintr.h   17 Sep 2023 15:24:35 -0000      1.20
> +++ sys/clockintr.h   17 Sep 2023 16:22:27 -0000
> @@ -122,6 +122,7 @@ void clockintr_trigger(void);
>  uint64_t clockintr_advance(struct clockintr *, uint64_t);
>  uint64_t clockintr_advance_random(struct clockintr *, uint64_t, uint32_t);
>  void clockintr_cancel(struct clockintr *);
> +void clockintr_disestablish(struct clockintr *);
>  struct clockintr *clockintr_establish(struct cpu_info *,
>      void (*)(struct clockintr *, void *, void *), void *);
>  void clockintr_stagger(struct clockintr *, uint64_t, uint32_t, uint32_t);

Reply via email to