> Date: Mon, 7 Sep 2020 18:50:44 -0500
> From: Scott Cheloha <scottchel...@gmail.com>
> 
> On Sat, Sep 05, 2020 at 01:11:59PM +0200, Mark Kettenis wrote:
> > > Date: Fri, 4 Sep 2020 17:55:39 -0500
> > > From: Scott Cheloha <scottchel...@gmail.com>
> > > 
> > > On Sat, Jul 25, 2020 at 08:46:08PM -0500, Scott Cheloha wrote:
> > > > 
> > > > [...]
> > > > 
> > > > I want to add clock-based timeouts to the kernel because tick-based
> > > > timeouts suffer from a few problems:
> > > > 
> > > > [...]
> > > > 
> > > > Basically, ticks are a poor approximation for the system clock.  We
> > > > should use the real thing where possible.
> > > > 
> > > > [...]
> > > > 
> > > > Thoughts on this approach?  Thoughts on the proposed API?
> > > 
> > > 6 week bump.
> > > 
> > > Attached is an rebased and streamlined diff.
> > > 
> > > Let's try again:
> > > 
> > > This patch adds support for timeouts scheduled against the hardware
> > > timecounter.  I call these "kclock timeouts".  They are distinct from
> > > the current tick-based timeouts because ticks are "software time", not
> > > "real time".
> > 
> > So what's the end game here?  Are these kclock-based timeouts going to
> > replace the tick-based timeouts at some point in the future?  I can
> > see why you want to have both in parallel for a while, but long-term I
> > don't think we want to keep both.
> 
> Ideally we would replace tick-based timeouts entirely with kclock
> timeouts eventually.
> 
> There are a few roadblocks, though:
> 
> 1. The scheduler is tick-based.  If you want to wait until the next
>    tick, the easiest way to do that is with timeout_add(9) or tsleep(9).

I don't think this really matters in most cases.  Keeping the tick as
the base for a scheduling quantum is probably wise for now, but I
don't think it matters that timeouts and tsleeps (especially tsleeps)
are actually synchronized to the scheduling clock.

> 2. Linux has ktimers, which is tick-based.  drm uses it.  Shouldn't
>    we have a tick-based timeout interface for compatibility with them?
>    We could fake it, like FreeBSD does, but doing so is probably more
>    complicated than just keeping support for tick-based timeouts.

You can easily emulate this using an absolute timer that you keep
rescheduling.  I think that is preferable to keeping a complete
separate tick-based timeout system.

> 3. Scheduling a timeout with timeout_add(9) is fast.  Scheduling a
>    timeout with timeout_in_nsec(9) involves a clock read.  It is slower.
>    It is probably too slow for some code.
> 
> (1) will be overcome if ever the scheduler is no longer tick-based.
> 
> (2) is tricky.  Maybe you or jsg@ have an opinion?

Not really.  But I don't think the Linux ktimers tick at the same rate
as ours so I don't think it matters.

> (3) is somewhat easier to fix.  I intend to introduce a TIMEOUT_COARSE
> flag in the future which causes timeout_in_nsec() to call
> getnanouptime(9) instead of nanouptime(9).  Reading the timestamp is
> faster than reading the clock.  You lose accuracy, but any code
> worried about the overhead of reading the clock is probably not very
> concerned with accuracy.

Right.

> > We don't really want to do a wholesale conversion of APIs again I'd
> > say.  So at some point the existing timeout_add_xxx() calls should be
> > implemented in terms of "kclock timeouts".
> 
> We can do this, but we'll still need to change the calls that
> reschedule a periodic timeout to use the dedicated rescheduling
> interface.  Otherwise those periodic timeouts will drift.  They don't
> currently drift because a tick is a very coarse unit of time.  With
> nanosecond resolution we'll get drift.

Periodic timeouts are rare.  At least those that care about drift.

> > This implementation is still tick driven, so it doesn't really provide
> > sub-tick resolution.
> 
> Yes, that's right.  Each timeout maintains nanosecond resolution for
> its expiration time but will only actually run after hardclock(9) runs
> and dumps the timeout to softclock().
> 
> We would need to implement a more flexible clock interrupt scheduler
> to run timeouts in between hardclocks.
> 
> > What does that mean for testing this?  I mean if we spend a lot of time
> > now to verify that subsystems can tolerate the more fine-grained timeouts,
> > we need to that again when you switch from having a period interrupt driving
> > the wheel to having a scheduled interrupt isn't it?
> 
> Yes.  But both changes can break things.
> 
> I think we should do kclock timeouts before sub-tick timeouts.  The
> former is a lot less disruptive than the latter, as the timeouts still
> run right after the hardclock.
> 
> And you need kclock timeouts to even test sub-tick timeouts anyway.
> 
> > > For now we have one kclock, KCLOCK_UPTIME, which corresponds to
> > > nanouptime(9).  In the future I intend to add support for runtime and
> > > UTC kclocks.
> > 
> > Do we really need that?  I suppose it helps implementing something
> > like clock_nanosleep() with the TIMER_ABSTIME flag for various
> > clock_id values?
> 
> Exactly.
> 
> FreeBSD decided to not support multiple clocks in their revamped
> callout(9).  The result is a bit simpler (one clock) but in order to
> implement absolute CLOCK_REALTIME sleeps for userspace they have this
> flag for each thread that causes the thread to wake up and reschedule
> itself whenever settimeofday(2) happens.
> 
> It's clever, but it seems messy to me.
> 
> I would rather support UTC timeouts as "first class citizens" of the
> timeout subsystem.  Linux's hrtimers API supports UTC timeouts
> explicitly.  I prefer their approach.
> 
> The advantage of real support is that the timeout(9) subsystem will
> handle rebucketing UTC timeouts if the clock jumps backwards
> transparently.  This is nice.

ok, thanks for the explanation.

> > > Why do we want kclock timeouts at all?
> > > 
> > > 1. Kclock timeouts expire at an actual time, not a tick.  They
> > >    have nanosecond resolution and are NTP-sensitive.  Thus, they
> > >    will *never* fire early.
> > 
> > Is there a lot of overhead in these being NTP-sensitive?  I'm asking
> > because for short timeouts you don't really care about NTP
> > corrections.
> 
> No.  Our timecounting system is inherently NTP-sensitive.  All
> timestamps you get from e.g. nanouptime(9) are tweaked according to
> kernel NTP values.  It adds no overhead.
> 
> > > 2. One upshot of nanosecond resolution is that we don't need to
> > >    "round up" to the next tick when scheduling a timeout to prevent
> > >    early execution.  The extra resolution allows us to reduce
> > >    latency in some contexts.
> > > 
> > > 3. Kclock timeouts cover the entire range of the kernel timeline.
> > >    We can remove the "tick loops" like the one sys_nanosleep().
> > > 
> > > 4. Kclock timeouts are scheduled as absolute deadlines.  This makes
> > >    supporting absolute timeouts trivial, which means we can add support
> > >    for clock_nanosleep(2) and the absolute pthread timeouts to the
> > >    kernel.
> > > 
> > > Kclock timeouts aren't actually used anywhere yet, so merging this
> > > patch will not break anything like last time (CC bluhm@).
> > > 
> > > In a subsequent diff I will put them to use in tsleep_nsec(9) etc.
> > > This will enable those interfaces to block for less than a tick, which
> > > in turn will allow userspace to block for less than a tick in e.g.
> > > futex(2), and poll(2).  pd@ has verified that this fixes the "time
> > > problem" in OpenBSD vmm(4) VMs (CC pd@).
> > 
> > Like I said above, running the timeout is still tick-driven isn't it?
> 
> Yes, timeouts are still driven by the hardclock(9), which means your
> maximum effective resolution is still 1/hz.
> 
> > This avoids having to wait at least a tick for timeouts that are
> > shorter than a tick, but it means the timeout can still be extended up
> > to a full tick.
> 
> Yes.  Even if you schedule a timeout 1 nanosecond into the future you
> must still wait for the hardclock(9) to fire and dump the timeout to
> softclock() to run.
> 
> > > You initialize kclock timeouts with timeout_set_kclock().  You
> > > schedule them with timeout_in_nsec(), a relative timeout interface
> > > that accepts a count of nanoseconds.  If your timeout is in some
> > > other unit (seconds, milliseconds, whatever) you must convert it
> > > to nanoseconds before scheduling.  Something like this will work:
> > > 
> > >   timeout_in_nsec(&my_timeout, SEC_TO_NSEC(1));
> > > 
> > > There won't be a flavored API supporting every conceivable time unit.
> > 
> > So this is where I get worried.  What is the game plan?  Slowly
> > convert everything from timeout_add_xxx() to timeout_in_nsec()?  Or
> > offer this as a temporary interface for people to test some critical
> > subsystems after which we dump it and simply re-implement
> > timeout_add_xxx() as kclock-based timeouts?
> 
> I first want to put them to use in tsleep_nsec(9), msleep_nsec(9), and
> rwsleep_nsec(9).  These functions are basically never used in a hot
> path so it is a low-risk change.
> 
> After that, I'm not sure.
> 
> One thing that concerns me about the reimplementation approach is that
> there are lots of drivers without maintainers.  Subtly changing the
> way a bunch of code works without actually testing it sounds like a
> recipe for disaster.
> 
> > > In the future I will expose an absolute timeout interface and a
> > > periodic timeout rescheduling interface.  We don't need either of
> > > these interfaces to start, though.
> > 
> > Not sure how useful a periodic timeout rescheduling interface really
> > is if you have an absolute timeout interface.  And isn't
> > timeout_at_ts() already implemented in the diff?
> 
> You could do it by hand with timeout_at_ts(), but the dedicated
> rescheduling interface will be much easier to use:
> 
> - It automatically skips intervals that have already elapsed.
> 
> - You don't need to track your last expiration time by hand.
>   The interface uses state kept in the timeout struct to
>   determine the start of the period.
> 
> - It uses clever math to quickly find the next expiration time
>   instead of naively looping like we do in realitexpire():
> 
>       while (timespeccmp(&abstime, &now, <=))
>               timespecadd(&abstime, &period, &abstime);

ok

> > > Tick-based timeouts and kclock-based timeouts are *not* compatible.
> > > You cannot schedule a kclock timeout with timeout_add(9).  You cannot
> > > schedule a tick-based timeout with timeout_in_nsec(9).  I have added
> > > KASSERTs to prevent this.
> > > 
> > > Scheduling a kclock timeout with timeout_in_nsec() is more costly than
> > > scheduling a tick-based timeout with timeout_add(9) because you have
> > > to read the hardware timecounter.  The cost will vary with your clock:
> > > bad clocks have lots of overhead, good clocks have low-to-no overhead.
> > > The programmer will need to decide if the potential overhead is too
> > > high when employing these timeouts.  In most cases the overhead will
> > > not be a problem.  The network stack is one spot where it might be.
> > 
> > I doubt this will be a problem.  For very small timeouts folks
> > probably should use delay(9).
> > 
> > > Processing the kclock timeout wheel during hardclock(9) adds
> > > negligible overhead to that routine.
> > > 
> > > Processing a kclock timeout during softclock() is roughly 4 times as
> > > expensive as processing a tick-based timeout.  At idle on my 2Ghz
> > > amd64 machine tick-based timeouts take ~125 cycles to process while
> > > kclock timeouts take ~500 cycles.  The average cost seems to drop as
> > > more kclock timeouts are processed, though I can't really explain why.
> > 
> > Cache effects?  Some if the overhead may be there because you keep
> > track of "late" timeouts.  But that code isn't really necessary is it?
> 
> I was going to guess "cache effects" but they are black magic to me,
> so your guess is as good as mine.
> 
> As a small optimization we could move late timeout tracking into a
> TIMEOUT_DEBUG #ifdef.  This will probably be more useful on 32-bit
> systems than anywhere else, though.
> 
> > > Thoughts?  ok?
> > 
> > Some further nits below.
> 
> Updated patch attached.

The diff looks reasonable to me, but I'd like to discuss the path
forward with some people during the hackathon next week.

> Index: kern/kern_timeout.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_timeout.c,v
> retrieving revision 1.79
> diff -u -p -r1.79 kern_timeout.c
> --- kern/kern_timeout.c       7 Aug 2020 00:45:25 -0000       1.79
> +++ kern/kern_timeout.c       7 Sep 2020 23:46:44 -0000
> @@ -1,4 +1,4 @@
> -/*   $OpenBSD: kern_timeout.c,v 1.79 2020/08/07 00:45:25 cheloha Exp $       
> */
> +/*   $OpenBSD: kern_timeout.c,v 1.77 2020/08/01 08:40:20 anton Exp $ */
>  /*
>   * Copyright (c) 2001 Thomas Nordin <nor...@openbsd.org>
>   * Copyright (c) 2000-2001 Artur Grabowski <a...@openbsd.org>
> @@ -64,16 +64,27 @@ struct timeoutstat tostat;                /* [T] stati
>   * of the global variable "ticks" when the timeout should be called. There 
> are
>   * four levels with 256 buckets each.
>   */
> -#define BUCKETS 1024
> +#define WHEELCOUNT 4
>  #define WHEELSIZE 256
>  #define WHEELMASK 255
>  #define WHEELBITS 8
> +#define BUCKETS (WHEELCOUNT * WHEELSIZE)
>  
> -struct circq timeout_wheel[BUCKETS]; /* [T] Queues of timeouts */
> +struct circq timeout_wheel[BUCKETS]; /* [T] Tick-based timeouts */
> +struct circq timeout_wheel_kc[BUCKETS];      /* [T] Clock-based timeouts */
>  struct circq timeout_new;            /* [T] New, unscheduled timeouts */
>  struct circq timeout_todo;           /* [T] Due or needs rescheduling */
>  struct circq timeout_proc;           /* [T] Due + needs process context */
>  
> +time_t timeout_level_width[WHEELCOUNT];      /* [I] Wheel level width 
> (seconds) */
> +struct timespec tick_ts;             /* [I] Length of a tick (1/hz secs) */
> +
> +struct kclock {
> +     struct timespec kc_lastscan;    /* [T] Clock time at last wheel scan */
> +     struct timespec kc_late;        /* [T] Late if due prior */
> +     struct timespec kc_offset;      /* [T] Offset from primary kclock */
> +} timeout_kclock[KCLOCK_MAX];
> +
>  #define MASKWHEEL(wheel, time) (((time) >> ((wheel)*WHEELBITS)) & WHEELMASK)
>  
>  #define BUCKET(rel, abs)                                             \
> @@ -155,9 +166,15 @@ struct lock_type timeout_spinlock_type =
>       ((needsproc) ? &timeout_sleeplock_obj : &timeout_spinlock_obj)
>  #endif
>  
> +void kclock_nanotime(int, struct timespec *);
>  void softclock(void *);
>  void softclock_create_thread(void *);
> +void softclock_process_kclock_timeout(struct timeout *, int);
> +void softclock_process_tick_timeout(struct timeout *, int);
>  void softclock_thread(void *);
> +uint32_t timeout_bucket(struct timeout *);
> +uint32_t timeout_maskwheel(uint32_t, const struct timespec *);
> +void timeout_run(struct timeout *);
>  void timeout_proc_barrier(void *);
>  
>  /*
> @@ -207,13 +224,19 @@ timeout_sync_leave(int needsproc)
>  void
>  timeout_startup(void)
>  {
> -     int b;
> +     int b, level;
>  
>       CIRCQ_INIT(&timeout_new);
>       CIRCQ_INIT(&timeout_todo);
>       CIRCQ_INIT(&timeout_proc);
>       for (b = 0; b < nitems(timeout_wheel); b++)
>               CIRCQ_INIT(&timeout_wheel[b]);
> +     for (b = 0; b < nitems(timeout_wheel_kc); b++)
> +             CIRCQ_INIT(&timeout_wheel_kc[b]);
> +
> +     for (level = 0; level < nitems(timeout_level_width); level++)
> +             timeout_level_width[level] = 2 << (level * WHEELBITS);
> +     NSEC_TO_TIMESPEC(tick_nsec, &tick_ts);
>  }
>  
>  void
> @@ -229,25 +252,39 @@ timeout_proc_init(void)
>       kthread_create_deferred(softclock_create_thread, NULL);
>  }
>  
> +static inline void
> +_timeout_set(struct timeout *to, void (*fn)(void *), void *arg, int flags,
> +    int kclock)
> +{
> +     to->to_func = fn;
> +     to->to_arg = arg;
> +     to->to_flags = flags | TIMEOUT_INITIALIZED;
> +     to->to_kclock = kclock;
> +}
> +
>  void
>  timeout_set(struct timeout *new, void (*fn)(void *), void *arg)
>  {
> -     timeout_set_flags(new, fn, arg, 0);
> +     _timeout_set(new, fn, arg, 0, KCLOCK_NONE);
>  }
>  
>  void
>  timeout_set_flags(struct timeout *to, void (*fn)(void *), void *arg, int 
> flags)
>  {
> -     to->to_func = fn;
> -     to->to_arg = arg;
> -     to->to_process = NULL;
> -     to->to_flags = flags | TIMEOUT_INITIALIZED;
> +     _timeout_set(to, fn, arg, flags, KCLOCK_NONE);
>  }
>  
>  void
>  timeout_set_proc(struct timeout *new, void (*fn)(void *), void *arg)
>  {
> -     timeout_set_flags(new, fn, arg, TIMEOUT_PROC);
> +     _timeout_set(new, fn, arg, TIMEOUT_PROC, KCLOCK_NONE);
> +}
> +
> +void
> +timeout_set_kclock(struct timeout *to, void (*fn)(void *), void *arg,
> +    int flags, int kclock)
> +{
> +     _timeout_set(to, fn, arg, flags | TIMEOUT_KCLOCK, kclock);
>  }
>  
>  int
> @@ -257,6 +294,8 @@ timeout_add(struct timeout *new, int to_
>       int ret = 1;
>  
>       KASSERT(ISSET(new->to_flags, TIMEOUT_INITIALIZED));
> +     KASSERT(!ISSET(new->to_flags, TIMEOUT_KCLOCK));
> +     KASSERT(new->to_kclock == KCLOCK_NONE);
>       KASSERT(to_ticks >= 0);
>  
>       mtx_enter(&timeout_mutex);
> @@ -356,6 +395,65 @@ timeout_add_nsec(struct timeout *to, int
>  }
>  
>  int
> +timeout_at_ts(struct timeout *to, const struct timespec *abstime)
> +{
> +     struct timespec old_abstime;
> +     int ret = 1;
> +
> +     KASSERT(ISSET(to->to_flags, TIMEOUT_INITIALIZED | TIMEOUT_KCLOCK));
> +     KASSERT(to->to_kclock != KCLOCK_NONE);
> +
> +     mtx_enter(&timeout_mutex);
> +
> +     old_abstime = to->to_abstime;
> +     to->to_abstime = *abstime;
> +     CLR(to->to_flags, TIMEOUT_TRIGGERED);
> +
> +     if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) {
> +             if (timespeccmp(abstime, &old_abstime, <)) {
> +                     CIRCQ_REMOVE(&to->to_list);
> +                     CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list);
> +             }
> +             tostat.tos_readded++;
> +             ret = 0;
> +     } else {
> +             SET(to->to_flags, TIMEOUT_ONQUEUE);
> +             CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list);
> +     }
> +#if NKCOV > 0
> +     new->to_process = curproc->p_p;
> +#endif
> +     tostat.tos_added++;
> +     mtx_leave(&timeout_mutex);
> +
> +     return ret;
> +}
> +
> +int
> +timeout_in_nsec(struct timeout *to, uint64_t nsecs)
> +{
> +     struct timespec deadline, interval, now;
> +
> +     kclock_nanotime(to->to_kclock, &now);
> +     NSEC_TO_TIMESPEC(nsecs, &interval);
> +     timespecadd(&now, &interval, &deadline);
> +
> +     return timeout_at_ts(to, &deadline);
> +}
> +
> +void
> +kclock_nanotime(int kclock, struct timespec *now)
> +{
> +     switch (kclock) {
> +     case KCLOCK_UPTIME:
> +             nanouptime(now);
> +             break;
> +     default:
> +             panic("invalid kclock: 0x%x", kclock);
> +     }
> +}
> +
> +int
>  timeout_del(struct timeout *to)
>  {
>       int ret = 0;
> @@ -425,6 +523,47 @@ timeout_proc_barrier(void *arg)
>       cond_signal(c);
>  }
>  
> +uint32_t
> +timeout_bucket(struct timeout *to)
> +{
> +     struct kclock *kc = &timeout_kclock[to->to_kclock];
> +     struct timespec diff;
> +     uint32_t level;
> +
> +     KASSERT(ISSET(to->to_flags, TIMEOUT_KCLOCK));
> +     KASSERT(timespeccmp(&kc->kc_lastscan, &to->to_abstime, <));
> +
> +     timespecsub(&to->to_abstime, &kc->kc_lastscan, &diff);
> +     for (level = 0; level < nitems(timeout_level_width) - 1; level++) {
> +             if (diff.tv_sec < timeout_level_width[level])
> +                     break;
> +     }
> +     return level * WHEELSIZE + timeout_maskwheel(level, &to->to_abstime);
> +}
> +
> +/*
> + * Hash the absolute time into a bucket on a given level of the wheel.
> + *
> + * The complete hash is 32 bits.  The upper 25 bits are seconds, the
> + * lower 7 bits are nanoseconds.  tv_nsec is a positive value less
> + * than one billion so we need to divide it to isolate the desired
> + * bits.  We can't just shift it.
> + *
> + * The level is used to isolate an 8-bit portion of the hash.  The
> + * resulting number indicates which bucket the absolute time belongs
> + * to on the given level of the wheel.
> + */
> +uint32_t
> +timeout_maskwheel(uint32_t level, const struct timespec *abstime)
> +{
> +     uint32_t hi, lo;
> +
> +     hi = abstime->tv_sec << 7;
> +     lo = abstime->tv_nsec / 7812500;
> +
> +     return ((hi | lo) >> (level * WHEELBITS)) & WHEELMASK;
> +}
> +
>  /*
>   * This is called from hardclock() on the primary CPU at the start of
>   * every tick.
> @@ -432,7 +571,15 @@ timeout_proc_barrier(void *arg)
>  void
>  timeout_hardclock_update(void)
>  {
> -     int need_softclock = 1;
> +     struct timespec elapsed, now;
> +     struct kclock *kc;
> +     struct timespec *lastscan;
> +     int b, done, first, i, last, level, need_softclock, off;
> +
> +     kclock_nanotime(KCLOCK_UPTIME, &now);
> +     lastscan = &timeout_kclock[KCLOCK_UPTIME].kc_lastscan;
> +     timespecsub(&now, lastscan, &elapsed);
> +     need_softclock = 1;
>  
>       mtx_enter(&timeout_mutex);
>  
> @@ -446,6 +593,44 @@ timeout_hardclock_update(void)
>               }
>       }
>  
> +     /*
> +      * Dump the buckets that expired while we were away.
> +      *
> +      * If the elapsed time has exceeded a level's limit then we need
> +      * to dump every bucket in the level.  We have necessarily completed
> +      * a lap of that level, too, so we need to process buckets in the
> +      * next level.
> +      *
> +      * Otherwise we need to compare indices: if the index of the first
> +      * expired bucket is greater than that of the last then we have
> +      * completed a lap of the level and need to process buckets in the
> +      * next level.
> +      */
> +     for (level = 0; level < nitems(timeout_level_width); level++) {
> +             first = timeout_maskwheel(level, lastscan);
> +             if (elapsed.tv_sec >= timeout_level_width[level]) {
> +                     last = (first == 0) ? WHEELSIZE - 1 : first - 1;
> +                     done = 0;
> +             } else {
> +                     last = timeout_maskwheel(level, &now);
> +                     done = first <= last;
> +             }
> +             off = level * WHEELSIZE;
> +             for (b = first;; b = (b + 1) % WHEELSIZE) {
> +                     CIRCQ_CONCAT(&timeout_todo, &timeout_wheel_kc[off + b]);
> +                     if (b == last)
> +                             break;
> +             }
> +             if (done)
> +                     break;
> +     }
> +
> +     for (i = 0; i < nitems(timeout_kclock); i++) {
> +             kc = &timeout_kclock[i];
> +             timespecadd(&now, &kc->kc_offset, &kc->kc_lastscan);
> +             timespecsub(&kc->kc_lastscan, &tick_ts, &kc->kc_late);
> +     }
> +
>       if (CIRCQ_EMPTY(&timeout_new) && CIRCQ_EMPTY(&timeout_todo))
>               need_softclock = 0;
>  
> @@ -485,6 +670,51 @@ timeout_run(struct timeout *to)
>       mtx_enter(&timeout_mutex);
>  }
>  
> +void
> +softclock_process_kclock_timeout(struct timeout *to, int new)
> +{
> +     struct kclock *kc = &timeout_kclock[to->to_kclock];
> +     
> +     if (timespeccmp(&to->to_abstime, &kc->kc_lastscan, >)) {
> +             tostat.tos_scheduled++;
> +             if (!new)
> +                     tostat.tos_rescheduled++;
> +             CIRCQ_INSERT_TAIL(&timeout_wheel_kc[timeout_bucket(to)],
> +                 &to->to_list);
> +             return;
> +     }
> +     if (!new && timespeccmp(&to->to_abstime, &kc->kc_late, <=))
> +             tostat.tos_late++;
> +     if (ISSET(to->to_flags, TIMEOUT_PROC)) {
> +             CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
> +             return;
> +     }
> +     timeout_run(to);
> +     tostat.tos_run_softclock++;
> +}
> +
> +void
> +softclock_process_tick_timeout(struct timeout *to, int new)
> +{
> +     int delta = to->to_time - ticks;
> +
> +     if (delta > 0) {
> +             tostat.tos_scheduled++;
> +             if (!new)
> +                     tostat.tos_rescheduled++;
> +             CIRCQ_INSERT_TAIL(&BUCKET(delta, to->to_time), &to->to_list);
> +             return;
> +     }
> +     if (!new && delta < 0)
> +             tostat.tos_late++;
> +     if (ISSET(to->to_flags, TIMEOUT_PROC)) {
> +             CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
> +             return;
> +     }
> +     timeout_run(to);
> +     tostat.tos_run_softclock++;
> +}
> +
>  /*
>   * Timeouts are processed here instead of timeout_hardclock_update()
>   * to avoid doing any more work at IPL_CLOCK than absolutely necessary.
> @@ -494,9 +724,8 @@ timeout_run(struct timeout *to)
>  void
>  softclock(void *arg)
>  {
> -     struct circq *bucket;
>       struct timeout *first_new, *to;
> -     int delta, needsproc, new;
> +     int needsproc, new;
>  
>       first_new = NULL;
>       new = 0;
> @@ -510,28 +739,10 @@ softclock(void *arg)
>               CIRCQ_REMOVE(&to->to_list);
>               if (to == first_new)
>                       new = 1;
> -
> -             /*
> -              * If due run it or defer execution to the thread,
> -              * otherwise insert it into the right bucket.
> -              */
> -             delta = to->to_time - ticks;
> -             if (delta > 0) {
> -                     bucket = &BUCKET(delta, to->to_time);
> -                     CIRCQ_INSERT_TAIL(bucket, &to->to_list);
> -                     tostat.tos_scheduled++;
> -                     if (!new)
> -                             tostat.tos_rescheduled++;
> -                     continue;
> -             }
> -             if (!new && delta < 0)
> -                     tostat.tos_late++;
> -             if (ISSET(to->to_flags, TIMEOUT_PROC)) {
> -                     CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
> -                     continue;
> -             }
> -             timeout_run(to);
> -             tostat.tos_run_softclock++;
> +             if (ISSET(to->to_flags, TIMEOUT_KCLOCK))
> +                     softclock_process_kclock_timeout(to, new);
> +             else
> +                     softclock_process_tick_timeout(to, new);
>       }
>       tostat.tos_softclocks++;
>       needsproc = !CIRCQ_EMPTY(&timeout_proc);
> @@ -630,52 +841,114 @@ timeout_sysctl(void *oldp, size_t *oldle
>  }
>  
>  #ifdef DDB
> +const char *db_kclock(int);
>  void db_show_callout_bucket(struct circq *);
> +void db_show_timeout(struct timeout *, struct circq *);
> +const char *db_timespec(const struct timespec *);
> +
> +const char *
> +db_kclock(int kclock)
> +{
> +     switch (kclock) {
> +     case KCLOCK_UPTIME:
> +             return "uptime";
> +     default:
> +             return "invalid";
> +     }
> +}
> +
> +const char *
> +db_timespec(const struct timespec *ts)
> +{
> +     static char buf[32];
> +     struct timespec tmp, zero;
> +
> +     if (ts->tv_sec >= 0) {
> +             snprintf(buf, sizeof(buf), "%lld.%09ld",
> +                 ts->tv_sec, ts->tv_nsec);
> +             return buf;
> +     }
> +
> +     timespecclear(&zero);
> +     timespecsub(&zero, ts, &tmp);
> +     snprintf(buf, sizeof(buf), "-%lld.%09ld", tmp.tv_sec, tmp.tv_nsec);
> +     return buf;
> +}
>  
>  void
>  db_show_callout_bucket(struct circq *bucket)
>  {
> -     char buf[8];
> -     struct timeout *to;
>       struct circq *p;
> +
> +     CIRCQ_FOREACH(p, bucket)
> +             db_show_timeout(timeout_from_circq(p), bucket);
> +}
> +
> +void
> +db_show_timeout(struct timeout *to, struct circq *bucket)
> +{
> +     struct timespec remaining;
> +     struct kclock *kc;
> +     char buf[8];
>       db_expr_t offset;
> +     struct circq *wheel;
>       char *name, *where;
>       int width = sizeof(long) * 2;
>  
> -     CIRCQ_FOREACH(p, bucket) {
> -             to = timeout_from_circq(p);
> -             db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset);
> -             name = name ? name : "?";
> -             if (bucket == &timeout_todo)
> -                     where = "softint";
> -             else if (bucket == &timeout_proc)
> -                     where = "thread";
> -             else if (bucket == &timeout_new)
> -                     where = "new";
> -             else {
> -                     snprintf(buf, sizeof(buf), "%3ld/%1ld",
> -                         (bucket - timeout_wheel) % WHEELSIZE,
> -                         (bucket - timeout_wheel) / WHEELSIZE);
> -                     where = buf;
> -             }
> -             db_printf("%9d  %7s  0x%0*lx  %s\n",
> -                 to->to_time - ticks, where, width, (ulong)to->to_arg, name);
> +     db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset);
> +     name = name ? name : "?";
> +     if (bucket == &timeout_new)
> +             where = "new";
> +     else if (bucket == &timeout_todo)
> +             where = "softint";
> +     else if (bucket == &timeout_proc)
> +             where = "thread";
> +     else {
> +             if (ISSET(to->to_flags, TIMEOUT_KCLOCK))
> +                     wheel = timeout_wheel_kc;
> +             else
> +                     wheel = timeout_wheel;
> +             snprintf(buf, sizeof(buf), "%3ld/%1ld",
> +                 (bucket - wheel) % WHEELSIZE,
> +                 (bucket - wheel) / WHEELSIZE);
> +             where = buf;
> +     }
> +     if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) {
> +             kc = &timeout_kclock[to->to_kclock];
> +             timespecsub(&to->to_abstime, &kc->kc_lastscan, &remaining);
> +             db_printf("%20s  %8s  %7s  0x%0*lx  %s\n",
> +                 db_timespec(&remaining), db_kclock(to->to_kclock), where,
> +                 width, (ulong)to->to_arg, name);
> +     } else {
> +             db_printf("%20d  %8s  %7s  0x%0*lx  %s\n",
> +                 to->to_time - ticks, "ticks", where,
> +                 width, (ulong)to->to_arg, name);
>       }
>  }
>  
>  void
>  db_show_callout(db_expr_t addr, int haddr, db_expr_t count, char *modif)
>  {
> +     struct kclock *kc;
>       int width = sizeof(long) * 2 + 2;
> -     int b;
> -
> -     db_printf("ticks now: %d\n", ticks);
> -     db_printf("%9s  %7s  %*s  func\n", "ticks", "wheel", width, "arg");
> +     int b, i;
>  
> +     db_printf("%20s  %8s\n", "lastscan", "clock");
> +     db_printf("%20d  %8s\n", ticks, "ticks");
> +     for (i = 0; i < nitems(timeout_kclock); i++) {
> +             kc = &timeout_kclock[i];
> +             db_printf("%20s  %8s\n",
> +                 db_timespec(&kc->kc_lastscan), db_kclock(i));
> +     }
> +     db_printf("\n");        
> +     db_printf("%20s  %8s  %7s  %*s  %s\n",
> +         "remaining", "clock", "wheel", width, "arg", "func");
>       db_show_callout_bucket(&timeout_new);
>       db_show_callout_bucket(&timeout_todo);
>       db_show_callout_bucket(&timeout_proc);
>       for (b = 0; b < nitems(timeout_wheel); b++)
>               db_show_callout_bucket(&timeout_wheel[b]);
> +     for (b = 0; b < nitems(timeout_wheel_kc); b++)
> +             db_show_callout_bucket(&timeout_wheel_kc[b]);
>  }
>  #endif
> Index: sys/timeout.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/timeout.h,v
> retrieving revision 1.39
> diff -u -p -r1.39 timeout.h
> --- sys/timeout.h     7 Aug 2020 00:45:25 -0000       1.39
> +++ sys/timeout.h     7 Sep 2020 23:46:44 -0000
> @@ -1,4 +1,4 @@
> -/*   $OpenBSD: timeout.h,v 1.39 2020/08/07 00:45:25 cheloha Exp $    */
> +/*   $OpenBSD: timeout.h,v 1.38 2020/08/01 08:40:20 anton Exp $      */
>  /*
>   * Copyright (c) 2000-2001 Artur Grabowski <a...@openbsd.org>
>   * All rights reserved. 
> @@ -51,6 +51,8 @@
>   * These functions may be called in interrupt context (anything below 
> splhigh).
>   */
>  
> +#include <sys/time.h>
> +
>  struct circq {
>       struct circq *next;             /* next element */
>       struct circq *prev;             /* previous element */
> @@ -58,13 +60,15 @@ struct circq {
>  
>  struct timeout {
>       struct circq to_list;                   /* timeout queue, don't move */
> +     struct timespec to_abstime;             /* absolute time to run at */
>       void (*to_func)(void *);                /* function to call */
>       void *to_arg;                           /* function argument */
> -     int to_time;                            /* ticks on event */
> -     int to_flags;                           /* misc flags */
>  #if 1 /* NKCOV > 0 */
>       struct process *to_process;             /* kcov identifier */
>  #endif
> +     int to_time;                            /* ticks on event */
> +     int to_flags;                           /* misc flags */
> +     int to_kclock;                          /* abstime's kernel clock */
>  };
>  
>  /*
> @@ -74,6 +78,7 @@ struct timeout {
>  #define TIMEOUT_ONQUEUE              0x02    /* on any timeout queue */
>  #define TIMEOUT_INITIALIZED  0x04    /* initialized */
>  #define TIMEOUT_TRIGGERED    0x08    /* running or ran */
> +#define TIMEOUT_KCLOCK               0x10    /* clock-based timeout */
>  
>  struct timeoutstat {
>       uint64_t tos_added;             /* timeout_add*(9) calls */
> @@ -103,25 +108,43 @@ int timeout_sysctl(void *, size_t *, voi
>  #define timeout_initialized(to) ((to)->to_flags & TIMEOUT_INITIALIZED)
>  #define timeout_triggered(to) ((to)->to_flags & TIMEOUT_TRIGGERED)
>  
> -#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags) {                  \
> +#define KCLOCK_NONE  (-1)            /* dummy clock for sanity checks */
> +#define KCLOCK_UPTIME        0               /* uptime clock; time since 
> boot */
> +#define KCLOCK_MAX   1
> +
> +#define __TIMEOUT_INITIALIZER(fn, arg, flags, kclock) {                      
> \
>       .to_list = { NULL, NULL },                                      \
> +     .to_abstime = { .tv_sec = 0, .tv_nsec = 0 },                    \
>       .to_func = (fn),                                                \
>       .to_arg = (arg),                                                \
>       .to_time = 0,                                                   \
> -     .to_flags = (flags) | TIMEOUT_INITIALIZED                       \
> +     .to_flags = (flags) | TIMEOUT_INITIALIZED,                      \
> +     .to_kclock = (kclock)                                           \
>  }
>  
> -#define TIMEOUT_INITIALIZER(_f, _a) TIMEOUT_INITIALIZER_FLAGS((_f), (_a), 0)
> +#define TIMEOUT_INITIALIZER_KCLOCK(fn, arg, flags, kclock)           \
> +    __TIMEOUT_INITIALIZER((fn), (args), (flags) | TIMEOUT_KCLOCK, (kclock))
> +
> +#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags)                    \
> +    __TIMEOUT_INITIALIZER((fn), (args), (flags), KCLOCK_NONE)
> +
> +#define TIMEOUT_INITIALIZER(_f, _a)                                  \
> +    __TIMEOUT_INITIALIZER((_f), (_a), 0, KCLOCK_NONE)
>  
>  void timeout_set(struct timeout *, void (*)(void *), void *);
>  void timeout_set_flags(struct timeout *, void (*)(void *), void *, int);
> +void timeout_set_kclock(struct timeout *, void (*)(void *), void *, int, 
> int);
>  void timeout_set_proc(struct timeout *, void (*)(void *), void *);
> +
>  int timeout_add(struct timeout *, int);
>  int timeout_add_tv(struct timeout *, const struct timeval *);
>  int timeout_add_sec(struct timeout *, int);
>  int timeout_add_msec(struct timeout *, int);
>  int timeout_add_usec(struct timeout *, int);
>  int timeout_add_nsec(struct timeout *, int);
> +
> +int timeout_in_nsec(struct timeout *, uint64_t);
> +
>  int timeout_del(struct timeout *);
>  int timeout_del_barrier(struct timeout *);
>  void timeout_barrier(struct timeout *);
> 

Reply via email to