Re: timeout(9): add clock-based timeouts (attempt 2)

Scott Cheloha Mon, 07 Sep 2020 16:52:09 -0700

On Sat, Sep 05, 2020 at 01:11:59PM +0200, Mark Kettenis wrote:
> > Date: Fri, 4 Sep 2020 17:55:39 -0500
> > From: Scott Cheloha <scottchel...@gmail.com>
> > 
> > On Sat, Jul 25, 2020 at 08:46:08PM -0500, Scott Cheloha wrote:
> > > 
> > > [...]
> > > 
> > > I want to add clock-based timeouts to the kernel because tick-based
> > > timeouts suffer from a few problems:
> > > 
> > > [...]
> > > 
> > > Basically, ticks are a poor approximation for the system clock.  We
> > > should use the real thing where possible.
> > > 
> > > [...]
> > > 
> > > Thoughts on this approach?  Thoughts on the proposed API?
> > 
> > 6 week bump.
> > 
> > Attached is an rebased and streamlined diff.
> > 
> > Let's try again:
> > 
> > This patch adds support for timeouts scheduled against the hardware
> > timecounter.  I call these "kclock timeouts".  They are distinct from
> > the current tick-based timeouts because ticks are "software time", not
> > "real time".
> 
> So what's the end game here?  Are these kclock-based timeouts going to
> replace the tick-based timeouts at some point in the future?  I can
> see why you want to have both in parallel for a while, but long-term I
> don't think we want to keep both.


Ideally we would replace tick-based timeouts entirely with kclock
timeouts eventually.

There are a few roadblocks, though:

1. The scheduler is tick-based.  If you want to wait until the next
   tick, the easiest way to do that is with timeout_add(9) or tsleep(9).

2. Linux has ktimers, which is tick-based.  drm uses it.  Shouldn't
   we have a tick-based timeout interface for compatibility with them?
   We could fake it, like FreeBSD does, but doing so is probably more
   complicated than just keeping support for tick-based timeouts.

3. Scheduling a timeout with timeout_add(9) is fast.  Scheduling a
   timeout with timeout_in_nsec(9) involves a clock read.  It is slower.
   It is probably too slow for some code.

(1) will be overcome if ever the scheduler is no longer tick-based.

(2) is tricky.  Maybe you or jsg@ have an opinion?

(3) is somewhat easier to fix.  I intend to introduce a TIMEOUT_COARSE
flag in the future which causes timeout_in_nsec() to call
getnanouptime(9) instead of nanouptime(9).  Reading the timestamp is
faster than reading the clock.  You lose accuracy, but any code
worried about the overhead of reading the clock is probably not very
concerned with accuracy.

> We don't really want to do a wholesale conversion of APIs again I'd
> say.  So at some point the existing timeout_add_xxx() calls should be
> implemented in terms of "kclock timeouts".

We can do this, but we'll still need to change the calls that
reschedule a periodic timeout to use the dedicated rescheduling
interface.  Otherwise those periodic timeouts will drift.  They don't
currently drift because a tick is a very coarse unit of time.  With
nanosecond resolution we'll get drift.

> This implementation is still tick driven, so it doesn't really provide
> sub-tick resolution.

Yes, that's right.  Each timeout maintains nanosecond resolution for
its expiration time but will only actually run after hardclock(9) runs
and dumps the timeout to softclock().

We would need to implement a more flexible clock interrupt scheduler
to run timeouts in between hardclocks.

> What does that mean for testing this?  I mean if we spend a lot of time
> now to verify that subsystems can tolerate the more fine-grained timeouts,
> we need to that again when you switch from having a period interrupt driving
> the wheel to having a scheduled interrupt isn't it?

Yes.  But both changes can break things.

I think we should do kclock timeouts before sub-tick timeouts.  The
former is a lot less disruptive than the latter, as the timeouts still
run right after the hardclock.

And you need kclock timeouts to even test sub-tick timeouts anyway.

> > For now we have one kclock, KCLOCK_UPTIME, which corresponds to
> > nanouptime(9).  In the future I intend to add support for runtime and
> > UTC kclocks.
> 
> Do we really need that?  I suppose it helps implementing something
> like clock_nanosleep() with the TIMER_ABSTIME flag for various
> clock_id values?

Exactly.

FreeBSD decided to not support multiple clocks in their revamped
callout(9).  The result is a bit simpler (one clock) but in order to
implement absolute CLOCK_REALTIME sleeps for userspace they have this
flag for each thread that causes the thread to wake up and reschedule
itself whenever settimeofday(2) happens.

It's clever, but it seems messy to me.

I would rather support UTC timeouts as "first class citizens" of the
timeout subsystem.  Linux's hrtimers API supports UTC timeouts
explicitly.  I prefer their approach.

The advantage of real support is that the timeout(9) subsystem will
handle rebucketing UTC timeouts if the clock jumps backwards
transparently.  This is nice.

> > Why do we want kclock timeouts at all?
> > 
> > 1. Kclock timeouts expire at an actual time, not a tick.  They
> >    have nanosecond resolution and are NTP-sensitive.  Thus, they
> >    will *never* fire early.
> 
> Is there a lot of overhead in these being NTP-sensitive?  I'm asking
> because for short timeouts you don't really care about NTP
> corrections.

No.  Our timecounting system is inherently NTP-sensitive.  All
timestamps you get from e.g. nanouptime(9) are tweaked according to
kernel NTP values.  It adds no overhead.

> > 2. One upshot of nanosecond resolution is that we don't need to
> >    "round up" to the next tick when scheduling a timeout to prevent
> >    early execution.  The extra resolution allows us to reduce
> >    latency in some contexts.
> > 
> > 3. Kclock timeouts cover the entire range of the kernel timeline.
> >    We can remove the "tick loops" like the one sys_nanosleep().
> > 
> > 4. Kclock timeouts are scheduled as absolute deadlines.  This makes
> >    supporting absolute timeouts trivial, which means we can add support
> >    for clock_nanosleep(2) and the absolute pthread timeouts to the
> >    kernel.
> > 
> > Kclock timeouts aren't actually used anywhere yet, so merging this
> > patch will not break anything like last time (CC bluhm@).
> > 
> > In a subsequent diff I will put them to use in tsleep_nsec(9) etc.
> > This will enable those interfaces to block for less than a tick, which
> > in turn will allow userspace to block for less than a tick in e.g.
> > futex(2), and poll(2).  pd@ has verified that this fixes the "time
> > problem" in OpenBSD vmm(4) VMs (CC pd@).
> 
> Like I said above, running the timeout is still tick-driven isn't it?

Yes, timeouts are still driven by the hardclock(9), which means your
maximum effective resolution is still 1/hz.

> This avoids having to wait at least a tick for timeouts that are
> shorter than a tick, but it means the timeout can still be extended up
> to a full tick.

Yes.  Even if you schedule a timeout 1 nanosecond into the future you
must still wait for the hardclock(9) to fire and dump the timeout to
softclock() to run.

> > You initialize kclock timeouts with timeout_set_kclock().  You
> > schedule them with timeout_in_nsec(), a relative timeout interface
> > that accepts a count of nanoseconds.  If your timeout is in some
> > other unit (seconds, milliseconds, whatever) you must convert it
> > to nanoseconds before scheduling.  Something like this will work:
> > 
> >     timeout_in_nsec(&my_timeout, SEC_TO_NSEC(1));
> > 
> > There won't be a flavored API supporting every conceivable time unit.
> 
> So this is where I get worried.  What is the game plan?  Slowly
> convert everything from timeout_add_xxx() to timeout_in_nsec()?  Or
> offer this as a temporary interface for people to test some critical
> subsystems after which we dump it and simply re-implement
> timeout_add_xxx() as kclock-based timeouts?

I first want to put them to use in tsleep_nsec(9), msleep_nsec(9), and
rwsleep_nsec(9).  These functions are basically never used in a hot
path so it is a low-risk change.

After that, I'm not sure.

One thing that concerns me about the reimplementation approach is that
there are lots of drivers without maintainers.  Subtly changing the
way a bunch of code works without actually testing it sounds like a
recipe for disaster.

> > In the future I will expose an absolute timeout interface and a
> > periodic timeout rescheduling interface.  We don't need either of
> > these interfaces to start, though.
> 
> Not sure how useful a periodic timeout rescheduling interface really
> is if you have an absolute timeout interface.  And isn't
> timeout_at_ts() already implemented in the diff?

You could do it by hand with timeout_at_ts(), but the dedicated
rescheduling interface will be much easier to use:

- It automatically skips intervals that have already elapsed.

- You don't need to track your last expiration time by hand.
  The interface uses state kept in the timeout struct to
  determine the start of the period.

- It uses clever math to quickly find the next expiration time
  instead of naively looping like we do in realitexpire():

        while (timespeccmp(&abstime, &now, <=))
                timespecadd(&abstime, &period, &abstime);

> > Tick-based timeouts and kclock-based timeouts are *not* compatible.
> > You cannot schedule a kclock timeout with timeout_add(9).  You cannot
> > schedule a tick-based timeout with timeout_in_nsec(9).  I have added
> > KASSERTs to prevent this.
> > 
> > Scheduling a kclock timeout with timeout_in_nsec() is more costly than
> > scheduling a tick-based timeout with timeout_add(9) because you have
> > to read the hardware timecounter.  The cost will vary with your clock:
> > bad clocks have lots of overhead, good clocks have low-to-no overhead.
> > The programmer will need to decide if the potential overhead is too
> > high when employing these timeouts.  In most cases the overhead will
> > not be a problem.  The network stack is one spot where it might be.
> 
> I doubt this will be a problem.  For very small timeouts folks
> probably should use delay(9).
> 
> > Processing the kclock timeout wheel during hardclock(9) adds
> > negligible overhead to that routine.
> > 
> > Processing a kclock timeout during softclock() is roughly 4 times as
> > expensive as processing a tick-based timeout.  At idle on my 2Ghz
> > amd64 machine tick-based timeouts take ~125 cycles to process while
> > kclock timeouts take ~500 cycles.  The average cost seems to drop as
> > more kclock timeouts are processed, though I can't really explain why.
> 
> Cache effects?  Some if the overhead may be there because you keep
> track of "late" timeouts.  But that code isn't really necessary is it?

I was going to guess "cache effects" but they are black magic to me,
so your guess is as good as mine.

As a small optimization we could move late timeout tracking into a
TIMEOUT_DEBUG #ifdef.  This will probably be more useful on 32-bit
systems than anywhere else, though.

> > Thoughts?  ok?
> 
> Some further nits below.

Updated patch attached.

Index: kern/kern_timeout.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_timeout.c,v
retrieving revision 1.79
diff -u -p -r1.79 kern_timeout.c
--- kern/kern_timeout.c 7 Aug 2020 00:45:25 -0000       1.79
+++ kern/kern_timeout.c 7 Sep 2020 23:46:44 -0000
@@ -1,4 +1,4 @@
-/*     $OpenBSD: kern_timeout.c,v 1.79 2020/08/07 00:45:25 cheloha Exp $       
*/
+/*     $OpenBSD: kern_timeout.c,v 1.77 2020/08/01 08:40:20 anton Exp $ */
 /*
  * Copyright (c) 2001 Thomas Nordin <nor...@openbsd.org>
  * Copyright (c) 2000-2001 Artur Grabowski <a...@openbsd.org>
@@ -64,16 +64,27 @@ struct timeoutstat tostat;          /* [T] stati
  * of the global variable "ticks" when the timeout should be called. There are
  * four levels with 256 buckets each.
  */
-#define BUCKETS 1024
+#define WHEELCOUNT 4
 #define WHEELSIZE 256
 #define WHEELMASK 255
 #define WHEELBITS 8
+#define BUCKETS (WHEELCOUNT * WHEELSIZE)
 
-struct circq timeout_wheel[BUCKETS];   /* [T] Queues of timeouts */
+struct circq timeout_wheel[BUCKETS];   /* [T] Tick-based timeouts */
+struct circq timeout_wheel_kc[BUCKETS];        /* [T] Clock-based timeouts */
 struct circq timeout_new;              /* [T] New, unscheduled timeouts */
 struct circq timeout_todo;             /* [T] Due or needs rescheduling */
 struct circq timeout_proc;             /* [T] Due + needs process context */
 
+time_t timeout_level_width[WHEELCOUNT];        /* [I] Wheel level width 
(seconds) */
+struct timespec tick_ts;               /* [I] Length of a tick (1/hz secs) */
+
+struct kclock {
+       struct timespec kc_lastscan;    /* [T] Clock time at last wheel scan */
+       struct timespec kc_late;        /* [T] Late if due prior */
+       struct timespec kc_offset;      /* [T] Offset from primary kclock */
+} timeout_kclock[KCLOCK_MAX];
+
 #define MASKWHEEL(wheel, time) (((time) >> ((wheel)*WHEELBITS)) & WHEELMASK)
 
 #define BUCKET(rel, abs)                                               \
@@ -155,9 +166,15 @@ struct lock_type timeout_spinlock_type =
        ((needsproc) ? &timeout_sleeplock_obj : &timeout_spinlock_obj)
 #endif
 
+void kclock_nanotime(int, struct timespec *);
 void softclock(void *);
 void softclock_create_thread(void *);
+void softclock_process_kclock_timeout(struct timeout *, int);
+void softclock_process_tick_timeout(struct timeout *, int);
 void softclock_thread(void *);
+uint32_t timeout_bucket(struct timeout *);
+uint32_t timeout_maskwheel(uint32_t, const struct timespec *);
+void timeout_run(struct timeout *);
 void timeout_proc_barrier(void *);
 
 /*
@@ -207,13 +224,19 @@ timeout_sync_leave(int needsproc)
 void
 timeout_startup(void)
 {
-       int b;
+       int b, level;
 
        CIRCQ_INIT(&timeout_new);
        CIRCQ_INIT(&timeout_todo);
        CIRCQ_INIT(&timeout_proc);
        for (b = 0; b < nitems(timeout_wheel); b++)
                CIRCQ_INIT(&timeout_wheel[b]);
+       for (b = 0; b < nitems(timeout_wheel_kc); b++)
+               CIRCQ_INIT(&timeout_wheel_kc[b]);
+
+       for (level = 0; level < nitems(timeout_level_width); level++)
+               timeout_level_width[level] = 2 << (level * WHEELBITS);
+       NSEC_TO_TIMESPEC(tick_nsec, &tick_ts);
 }
 
 void
@@ -229,25 +252,39 @@ timeout_proc_init(void)
        kthread_create_deferred(softclock_create_thread, NULL);
 }
 
+static inline void
+_timeout_set(struct timeout *to, void (*fn)(void *), void *arg, int flags,
+    int kclock)
+{
+       to->to_func = fn;
+       to->to_arg = arg;
+       to->to_flags = flags | TIMEOUT_INITIALIZED;
+       to->to_kclock = kclock;
+}
+
 void
 timeout_set(struct timeout *new, void (*fn)(void *), void *arg)
 {
-       timeout_set_flags(new, fn, arg, 0);
+       _timeout_set(new, fn, arg, 0, KCLOCK_NONE);
 }
 
 void
 timeout_set_flags(struct timeout *to, void (*fn)(void *), void *arg, int flags)
 {
-       to->to_func = fn;
-       to->to_arg = arg;
-       to->to_process = NULL;
-       to->to_flags = flags | TIMEOUT_INITIALIZED;
+       _timeout_set(to, fn, arg, flags, KCLOCK_NONE);
 }
 
 void
 timeout_set_proc(struct timeout *new, void (*fn)(void *), void *arg)
 {
-       timeout_set_flags(new, fn, arg, TIMEOUT_PROC);
+       _timeout_set(new, fn, arg, TIMEOUT_PROC, KCLOCK_NONE);
+}
+
+void
+timeout_set_kclock(struct timeout *to, void (*fn)(void *), void *arg,
+    int flags, int kclock)
+{
+       _timeout_set(to, fn, arg, flags | TIMEOUT_KCLOCK, kclock);
 }
 
 int
@@ -257,6 +294,8 @@ timeout_add(struct timeout *new, int to_
        int ret = 1;
 
        KASSERT(ISSET(new->to_flags, TIMEOUT_INITIALIZED));
+       KASSERT(!ISSET(new->to_flags, TIMEOUT_KCLOCK));
+       KASSERT(new->to_kclock == KCLOCK_NONE);
        KASSERT(to_ticks >= 0);
 
        mtx_enter(&timeout_mutex);
@@ -356,6 +395,65 @@ timeout_add_nsec(struct timeout *to, int
 }
 
 int
+timeout_at_ts(struct timeout *to, const struct timespec *abstime)
+{
+       struct timespec old_abstime;
+       int ret = 1;
+
+       KASSERT(ISSET(to->to_flags, TIMEOUT_INITIALIZED | TIMEOUT_KCLOCK));
+       KASSERT(to->to_kclock != KCLOCK_NONE);
+
+       mtx_enter(&timeout_mutex);
+
+       old_abstime = to->to_abstime;
+       to->to_abstime = *abstime;
+       CLR(to->to_flags, TIMEOUT_TRIGGERED);
+
+       if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) {
+               if (timespeccmp(abstime, &old_abstime, <)) {
+                       CIRCQ_REMOVE(&to->to_list);
+                       CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list);
+               }
+               tostat.tos_readded++;
+               ret = 0;
+       } else {
+               SET(to->to_flags, TIMEOUT_ONQUEUE);
+               CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list);
+       }
+#if NKCOV > 0
+       new->to_process = curproc->p_p;
+#endif
+       tostat.tos_added++;
+       mtx_leave(&timeout_mutex);
+
+       return ret;
+}
+
+int
+timeout_in_nsec(struct timeout *to, uint64_t nsecs)
+{
+       struct timespec deadline, interval, now;
+
+       kclock_nanotime(to->to_kclock, &now);
+       NSEC_TO_TIMESPEC(nsecs, &interval);
+       timespecadd(&now, &interval, &deadline);
+
+       return timeout_at_ts(to, &deadline);
+}
+
+void
+kclock_nanotime(int kclock, struct timespec *now)
+{
+       switch (kclock) {
+       case KCLOCK_UPTIME:
+               nanouptime(now);
+               break;
+       default:
+               panic("invalid kclock: 0x%x", kclock);
+       }
+}
+
+int
 timeout_del(struct timeout *to)
 {
        int ret = 0;
@@ -425,6 +523,47 @@ timeout_proc_barrier(void *arg)
        cond_signal(c);
 }
 
+uint32_t
+timeout_bucket(struct timeout *to)
+{
+       struct kclock *kc = &timeout_kclock[to->to_kclock];
+       struct timespec diff;
+       uint32_t level;
+
+       KASSERT(ISSET(to->to_flags, TIMEOUT_KCLOCK));
+       KASSERT(timespeccmp(&kc->kc_lastscan, &to->to_abstime, <));
+
+       timespecsub(&to->to_abstime, &kc->kc_lastscan, &diff);
+       for (level = 0; level < nitems(timeout_level_width) - 1; level++) {
+               if (diff.tv_sec < timeout_level_width[level])
+                       break;
+       }
+       return level * WHEELSIZE + timeout_maskwheel(level, &to->to_abstime);
+}
+
+/*
+ * Hash the absolute time into a bucket on a given level of the wheel.
+ *
+ * The complete hash is 32 bits.  The upper 25 bits are seconds, the
+ * lower 7 bits are nanoseconds.  tv_nsec is a positive value less
+ * than one billion so we need to divide it to isolate the desired
+ * bits.  We can't just shift it.
+ *
+ * The level is used to isolate an 8-bit portion of the hash.  The
+ * resulting number indicates which bucket the absolute time belongs
+ * to on the given level of the wheel.
+ */
+uint32_t
+timeout_maskwheel(uint32_t level, const struct timespec *abstime)
+{
+       uint32_t hi, lo;
+
+       hi = abstime->tv_sec << 7;
+       lo = abstime->tv_nsec / 7812500;
+
+       return ((hi | lo) >> (level * WHEELBITS)) & WHEELMASK;
+}
+
 /*
  * This is called from hardclock() on the primary CPU at the start of
  * every tick.
@@ -432,7 +571,15 @@ timeout_proc_barrier(void *arg)
 void
 timeout_hardclock_update(void)
 {
-       int need_softclock = 1;
+       struct timespec elapsed, now;
+       struct kclock *kc;
+       struct timespec *lastscan;
+       int b, done, first, i, last, level, need_softclock, off;
+
+       kclock_nanotime(KCLOCK_UPTIME, &now);
+       lastscan = &timeout_kclock[KCLOCK_UPTIME].kc_lastscan;
+       timespecsub(&now, lastscan, &elapsed);
+       need_softclock = 1;
 
        mtx_enter(&timeout_mutex);
 
@@ -446,6 +593,44 @@ timeout_hardclock_update(void)
                }
        }
 
+       /*
+        * Dump the buckets that expired while we were away.
+        *
+        * If the elapsed time has exceeded a level's limit then we need
+        * to dump every bucket in the level.  We have necessarily completed
+        * a lap of that level, too, so we need to process buckets in the
+        * next level.
+        *
+        * Otherwise we need to compare indices: if the index of the first
+        * expired bucket is greater than that of the last then we have
+        * completed a lap of the level and need to process buckets in the
+        * next level.
+        */
+       for (level = 0; level < nitems(timeout_level_width); level++) {
+               first = timeout_maskwheel(level, lastscan);
+               if (elapsed.tv_sec >= timeout_level_width[level]) {
+                       last = (first == 0) ? WHEELSIZE - 1 : first - 1;
+                       done = 0;
+               } else {
+                       last = timeout_maskwheel(level, &now);
+                       done = first <= last;
+               }
+               off = level * WHEELSIZE;
+               for (b = first;; b = (b + 1) % WHEELSIZE) {
+                       CIRCQ_CONCAT(&timeout_todo, &timeout_wheel_kc[off + b]);
+                       if (b == last)
+                               break;
+               }
+               if (done)
+                       break;
+       }
+
+       for (i = 0; i < nitems(timeout_kclock); i++) {
+               kc = &timeout_kclock[i];
+               timespecadd(&now, &kc->kc_offset, &kc->kc_lastscan);
+               timespecsub(&kc->kc_lastscan, &tick_ts, &kc->kc_late);
+       }
+
        if (CIRCQ_EMPTY(&timeout_new) && CIRCQ_EMPTY(&timeout_todo))
                need_softclock = 0;
 
@@ -485,6 +670,51 @@ timeout_run(struct timeout *to)
        mtx_enter(&timeout_mutex);
 }
 
+void
+softclock_process_kclock_timeout(struct timeout *to, int new)
+{
+       struct kclock *kc = &timeout_kclock[to->to_kclock];
+       
+       if (timespeccmp(&to->to_abstime, &kc->kc_lastscan, >)) {
+               tostat.tos_scheduled++;
+               if (!new)
+                       tostat.tos_rescheduled++;
+               CIRCQ_INSERT_TAIL(&timeout_wheel_kc[timeout_bucket(to)],
+                   &to->to_list);
+               return;
+       }
+       if (!new && timespeccmp(&to->to_abstime, &kc->kc_late, <=))
+               tostat.tos_late++;
+       if (ISSET(to->to_flags, TIMEOUT_PROC)) {
+               CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
+               return;
+       }
+       timeout_run(to);
+       tostat.tos_run_softclock++;
+}
+
+void
+softclock_process_tick_timeout(struct timeout *to, int new)
+{
+       int delta = to->to_time - ticks;
+
+       if (delta > 0) {
+               tostat.tos_scheduled++;
+               if (!new)
+                       tostat.tos_rescheduled++;
+               CIRCQ_INSERT_TAIL(&BUCKET(delta, to->to_time), &to->to_list);
+               return;
+       }
+       if (!new && delta < 0)
+               tostat.tos_late++;
+       if (ISSET(to->to_flags, TIMEOUT_PROC)) {
+               CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
+               return;
+       }
+       timeout_run(to);
+       tostat.tos_run_softclock++;
+}
+
 /*
  * Timeouts are processed here instead of timeout_hardclock_update()
  * to avoid doing any more work at IPL_CLOCK than absolutely necessary.
@@ -494,9 +724,8 @@ timeout_run(struct timeout *to)
 void
 softclock(void *arg)
 {
-       struct circq *bucket;
        struct timeout *first_new, *to;
-       int delta, needsproc, new;
+       int needsproc, new;
 
        first_new = NULL;
        new = 0;
@@ -510,28 +739,10 @@ softclock(void *arg)
                CIRCQ_REMOVE(&to->to_list);
                if (to == first_new)
                        new = 1;
-
-               /*
-                * If due run it or defer execution to the thread,
-                * otherwise insert it into the right bucket.
-                */
-               delta = to->to_time - ticks;
-               if (delta > 0) {
-                       bucket = &BUCKET(delta, to->to_time);
-                       CIRCQ_INSERT_TAIL(bucket, &to->to_list);
-                       tostat.tos_scheduled++;
-                       if (!new)
-                               tostat.tos_rescheduled++;
-                       continue;
-               }
-               if (!new && delta < 0)
-                       tostat.tos_late++;
-               if (ISSET(to->to_flags, TIMEOUT_PROC)) {
-                       CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
-                       continue;
-               }
-               timeout_run(to);
-               tostat.tos_run_softclock++;
+               if (ISSET(to->to_flags, TIMEOUT_KCLOCK))
+                       softclock_process_kclock_timeout(to, new);
+               else
+                       softclock_process_tick_timeout(to, new);
        }
        tostat.tos_softclocks++;
        needsproc = !CIRCQ_EMPTY(&timeout_proc);
@@ -630,52 +841,114 @@ timeout_sysctl(void *oldp, size_t *oldle
 }
 
 #ifdef DDB
+const char *db_kclock(int);
 void db_show_callout_bucket(struct circq *);
+void db_show_timeout(struct timeout *, struct circq *);
+const char *db_timespec(const struct timespec *);
+
+const char *
+db_kclock(int kclock)
+{
+       switch (kclock) {
+       case KCLOCK_UPTIME:
+               return "uptime";
+       default:
+               return "invalid";
+       }
+}
+
+const char *
+db_timespec(const struct timespec *ts)
+{
+       static char buf[32];
+       struct timespec tmp, zero;
+
+       if (ts->tv_sec >= 0) {
+               snprintf(buf, sizeof(buf), "%lld.%09ld",
+                   ts->tv_sec, ts->tv_nsec);
+               return buf;
+       }
+
+       timespecclear(&zero);
+       timespecsub(&zero, ts, &tmp);
+       snprintf(buf, sizeof(buf), "-%lld.%09ld", tmp.tv_sec, tmp.tv_nsec);
+       return buf;
+}
 
 void
 db_show_callout_bucket(struct circq *bucket)
 {
-       char buf[8];
-       struct timeout *to;
        struct circq *p;
+
+       CIRCQ_FOREACH(p, bucket)
+               db_show_timeout(timeout_from_circq(p), bucket);
+}
+
+void
+db_show_timeout(struct timeout *to, struct circq *bucket)
+{
+       struct timespec remaining;
+       struct kclock *kc;
+       char buf[8];
        db_expr_t offset;
+       struct circq *wheel;
        char *name, *where;
        int width = sizeof(long) * 2;
 
-       CIRCQ_FOREACH(p, bucket) {
-               to = timeout_from_circq(p);
-               db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset);
-               name = name ? name : "?";
-               if (bucket == &timeout_todo)
-                       where = "softint";
-               else if (bucket == &timeout_proc)
-                       where = "thread";
-               else if (bucket == &timeout_new)
-                       where = "new";
-               else {
-                       snprintf(buf, sizeof(buf), "%3ld/%1ld",
-                           (bucket - timeout_wheel) % WHEELSIZE,
-                           (bucket - timeout_wheel) / WHEELSIZE);
-                       where = buf;
-               }
-               db_printf("%9d  %7s  0x%0*lx  %s\n",
-                   to->to_time - ticks, where, width, (ulong)to->to_arg, name);
+       db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset);
+       name = name ? name : "?";
+       if (bucket == &timeout_new)
+               where = "new";
+       else if (bucket == &timeout_todo)
+               where = "softint";
+       else if (bucket == &timeout_proc)
+               where = "thread";
+       else {
+               if (ISSET(to->to_flags, TIMEOUT_KCLOCK))
+                       wheel = timeout_wheel_kc;
+               else
+                       wheel = timeout_wheel;
+               snprintf(buf, sizeof(buf), "%3ld/%1ld",
+                   (bucket - wheel) % WHEELSIZE,
+                   (bucket - wheel) / WHEELSIZE);
+               where = buf;
+       }
+       if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) {
+               kc = &timeout_kclock[to->to_kclock];
+               timespecsub(&to->to_abstime, &kc->kc_lastscan, &remaining);
+               db_printf("%20s  %8s  %7s  0x%0*lx  %s\n",
+                   db_timespec(&remaining), db_kclock(to->to_kclock), where,
+                   width, (ulong)to->to_arg, name);
+       } else {
+               db_printf("%20d  %8s  %7s  0x%0*lx  %s\n",
+                   to->to_time - ticks, "ticks", where,
+                   width, (ulong)to->to_arg, name);
        }
 }
 
 void
 db_show_callout(db_expr_t addr, int haddr, db_expr_t count, char *modif)
 {
+       struct kclock *kc;
        int width = sizeof(long) * 2 + 2;
-       int b;
-
-       db_printf("ticks now: %d\n", ticks);
-       db_printf("%9s  %7s  %*s  func\n", "ticks", "wheel", width, "arg");
+       int b, i;
 
+       db_printf("%20s  %8s\n", "lastscan", "clock");
+       db_printf("%20d  %8s\n", ticks, "ticks");
+       for (i = 0; i < nitems(timeout_kclock); i++) {
+               kc = &timeout_kclock[i];
+               db_printf("%20s  %8s\n",
+                   db_timespec(&kc->kc_lastscan), db_kclock(i));
+       }
+       db_printf("\n");        
+       db_printf("%20s  %8s  %7s  %*s  %s\n",
+           "remaining", "clock", "wheel", width, "arg", "func");
        db_show_callout_bucket(&timeout_new);
        db_show_callout_bucket(&timeout_todo);
        db_show_callout_bucket(&timeout_proc);
        for (b = 0; b < nitems(timeout_wheel); b++)
                db_show_callout_bucket(&timeout_wheel[b]);
+       for (b = 0; b < nitems(timeout_wheel_kc); b++)
+               db_show_callout_bucket(&timeout_wheel_kc[b]);
 }
 #endif
Index: sys/timeout.h
===================================================================
RCS file: /cvs/src/sys/sys/timeout.h,v
retrieving revision 1.39
diff -u -p -r1.39 timeout.h
--- sys/timeout.h       7 Aug 2020 00:45:25 -0000       1.39
+++ sys/timeout.h       7 Sep 2020 23:46:44 -0000
@@ -1,4 +1,4 @@
-/*     $OpenBSD: timeout.h,v 1.39 2020/08/07 00:45:25 cheloha Exp $    */
+/*     $OpenBSD: timeout.h,v 1.38 2020/08/01 08:40:20 anton Exp $      */
 /*
  * Copyright (c) 2000-2001 Artur Grabowski <a...@openbsd.org>
  * All rights reserved. 
@@ -51,6 +51,8 @@
  * These functions may be called in interrupt context (anything below splhigh).
  */
 
+#include <sys/time.h>
+
 struct circq {
        struct circq *next;             /* next element */
        struct circq *prev;             /* previous element */
@@ -58,13 +60,15 @@ struct circq {
 
 struct timeout {
        struct circq to_list;                   /* timeout queue, don't move */
+       struct timespec to_abstime;             /* absolute time to run at */
        void (*to_func)(void *);                /* function to call */
        void *to_arg;                           /* function argument */
-       int to_time;                            /* ticks on event */
-       int to_flags;                           /* misc flags */
 #if 1 /* NKCOV > 0 */
        struct process *to_process;             /* kcov identifier */
 #endif
+       int to_time;                            /* ticks on event */
+       int to_flags;                           /* misc flags */
+       int to_kclock;                          /* abstime's kernel clock */
 };
 
 /*
@@ -74,6 +78,7 @@ struct timeout {
 #define TIMEOUT_ONQUEUE                0x02    /* on any timeout queue */
 #define TIMEOUT_INITIALIZED    0x04    /* initialized */
 #define TIMEOUT_TRIGGERED      0x08    /* running or ran */
+#define TIMEOUT_KCLOCK         0x10    /* clock-based timeout */
 
 struct timeoutstat {
        uint64_t tos_added;             /* timeout_add*(9) calls */
@@ -103,25 +108,43 @@ int timeout_sysctl(void *, size_t *, voi
 #define timeout_initialized(to) ((to)->to_flags & TIMEOUT_INITIALIZED)
 #define timeout_triggered(to) ((to)->to_flags & TIMEOUT_TRIGGERED)
 
-#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags) {                    \
+#define KCLOCK_NONE    (-1)            /* dummy clock for sanity checks */
+#define KCLOCK_UPTIME  0               /* uptime clock; time since boot */
+#define KCLOCK_MAX     1
+
+#define __TIMEOUT_INITIALIZER(fn, arg, flags, kclock) {                        
\
        .to_list = { NULL, NULL },                                      \
+       .to_abstime = { .tv_sec = 0, .tv_nsec = 0 },                    \
        .to_func = (fn),                                                \
        .to_arg = (arg),                                                \
        .to_time = 0,                                                   \
-       .to_flags = (flags) | TIMEOUT_INITIALIZED                       \
+       .to_flags = (flags) | TIMEOUT_INITIALIZED,                      \
+       .to_kclock = (kclock)                                           \
 }
 
-#define TIMEOUT_INITIALIZER(_f, _a) TIMEOUT_INITIALIZER_FLAGS((_f), (_a), 0)
+#define TIMEOUT_INITIALIZER_KCLOCK(fn, arg, flags, kclock)             \
+    __TIMEOUT_INITIALIZER((fn), (args), (flags) | TIMEOUT_KCLOCK, (kclock))
+
+#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags)                      \
+    __TIMEOUT_INITIALIZER((fn), (args), (flags), KCLOCK_NONE)
+
+#define TIMEOUT_INITIALIZER(_f, _a)                                    \
+    __TIMEOUT_INITIALIZER((_f), (_a), 0, KCLOCK_NONE)
 
 void timeout_set(struct timeout *, void (*)(void *), void *);
 void timeout_set_flags(struct timeout *, void (*)(void *), void *, int);
+void timeout_set_kclock(struct timeout *, void (*)(void *), void *, int, int);
 void timeout_set_proc(struct timeout *, void (*)(void *), void *);
+
 int timeout_add(struct timeout *, int);
 int timeout_add_tv(struct timeout *, const struct timeval *);
 int timeout_add_sec(struct timeout *, int);
 int timeout_add_msec(struct timeout *, int);
 int timeout_add_usec(struct timeout *, int);
 int timeout_add_nsec(struct timeout *, int);
+
+int timeout_in_nsec(struct timeout *, uint64_t);
+
 int timeout_del(struct timeout *);
 int timeout_del_barrier(struct timeout *);
 void timeout_barrier(struct timeout *);

Re: timeout(9): add clock-based timeouts (attempt 2)

Reply via email to