> Date: Mon, 7 Sep 2020 18:50:44 -0500 > From: Scott Cheloha <scottchel...@gmail.com> > > On Sat, Sep 05, 2020 at 01:11:59PM +0200, Mark Kettenis wrote: > > > Date: Fri, 4 Sep 2020 17:55:39 -0500 > > > From: Scott Cheloha <scottchel...@gmail.com> > > > > > > On Sat, Jul 25, 2020 at 08:46:08PM -0500, Scott Cheloha wrote: > > > > > > > > [...] > > > > > > > > I want to add clock-based timeouts to the kernel because tick-based > > > > timeouts suffer from a few problems: > > > > > > > > [...] > > > > > > > > Basically, ticks are a poor approximation for the system clock. We > > > > should use the real thing where possible. > > > > > > > > [...] > > > > > > > > Thoughts on this approach? Thoughts on the proposed API? > > > > > > 6 week bump. > > > > > > Attached is an rebased and streamlined diff. > > > > > > Let's try again: > > > > > > This patch adds support for timeouts scheduled against the hardware > > > timecounter. I call these "kclock timeouts". They are distinct from > > > the current tick-based timeouts because ticks are "software time", not > > > "real time". > > > > So what's the end game here? Are these kclock-based timeouts going to > > replace the tick-based timeouts at some point in the future? I can > > see why you want to have both in parallel for a while, but long-term I > > don't think we want to keep both. > > Ideally we would replace tick-based timeouts entirely with kclock > timeouts eventually. > > There are a few roadblocks, though: > > 1. The scheduler is tick-based. If you want to wait until the next > tick, the easiest way to do that is with timeout_add(9) or tsleep(9).
I don't think this really matters in most cases. Keeping the tick as the base for a scheduling quantum is probably wise for now, but I don't think it matters that timeouts and tsleeps (especially tsleeps) are actually synchronized to the scheduling clock. > 2. Linux has ktimers, which is tick-based. drm uses it. Shouldn't > we have a tick-based timeout interface for compatibility with them? > We could fake it, like FreeBSD does, but doing so is probably more > complicated than just keeping support for tick-based timeouts. You can easily emulate this using an absolute timer that you keep rescheduling. I think that is preferable to keeping a complete separate tick-based timeout system. > 3. Scheduling a timeout with timeout_add(9) is fast. Scheduling a > timeout with timeout_in_nsec(9) involves a clock read. It is slower. > It is probably too slow for some code. > > (1) will be overcome if ever the scheduler is no longer tick-based. > > (2) is tricky. Maybe you or jsg@ have an opinion? Not really. But I don't think the Linux ktimers tick at the same rate as ours so I don't think it matters. > (3) is somewhat easier to fix. I intend to introduce a TIMEOUT_COARSE > flag in the future which causes timeout_in_nsec() to call > getnanouptime(9) instead of nanouptime(9). Reading the timestamp is > faster than reading the clock. You lose accuracy, but any code > worried about the overhead of reading the clock is probably not very > concerned with accuracy. Right. > > We don't really want to do a wholesale conversion of APIs again I'd > > say. So at some point the existing timeout_add_xxx() calls should be > > implemented in terms of "kclock timeouts". > > We can do this, but we'll still need to change the calls that > reschedule a periodic timeout to use the dedicated rescheduling > interface. Otherwise those periodic timeouts will drift. They don't > currently drift because a tick is a very coarse unit of time. With > nanosecond resolution we'll get drift. Periodic timeouts are rare. At least those that care about drift. > > This implementation is still tick driven, so it doesn't really provide > > sub-tick resolution. > > Yes, that's right. Each timeout maintains nanosecond resolution for > its expiration time but will only actually run after hardclock(9) runs > and dumps the timeout to softclock(). > > We would need to implement a more flexible clock interrupt scheduler > to run timeouts in between hardclocks. > > > What does that mean for testing this? I mean if we spend a lot of time > > now to verify that subsystems can tolerate the more fine-grained timeouts, > > we need to that again when you switch from having a period interrupt driving > > the wheel to having a scheduled interrupt isn't it? > > Yes. But both changes can break things. > > I think we should do kclock timeouts before sub-tick timeouts. The > former is a lot less disruptive than the latter, as the timeouts still > run right after the hardclock. > > And you need kclock timeouts to even test sub-tick timeouts anyway. > > > > For now we have one kclock, KCLOCK_UPTIME, which corresponds to > > > nanouptime(9). In the future I intend to add support for runtime and > > > UTC kclocks. > > > > Do we really need that? I suppose it helps implementing something > > like clock_nanosleep() with the TIMER_ABSTIME flag for various > > clock_id values? > > Exactly. > > FreeBSD decided to not support multiple clocks in their revamped > callout(9). The result is a bit simpler (one clock) but in order to > implement absolute CLOCK_REALTIME sleeps for userspace they have this > flag for each thread that causes the thread to wake up and reschedule > itself whenever settimeofday(2) happens. > > It's clever, but it seems messy to me. > > I would rather support UTC timeouts as "first class citizens" of the > timeout subsystem. Linux's hrtimers API supports UTC timeouts > explicitly. I prefer their approach. > > The advantage of real support is that the timeout(9) subsystem will > handle rebucketing UTC timeouts if the clock jumps backwards > transparently. This is nice. ok, thanks for the explanation. > > > Why do we want kclock timeouts at all? > > > > > > 1. Kclock timeouts expire at an actual time, not a tick. They > > > have nanosecond resolution and are NTP-sensitive. Thus, they > > > will *never* fire early. > > > > Is there a lot of overhead in these being NTP-sensitive? I'm asking > > because for short timeouts you don't really care about NTP > > corrections. > > No. Our timecounting system is inherently NTP-sensitive. All > timestamps you get from e.g. nanouptime(9) are tweaked according to > kernel NTP values. It adds no overhead. > > > > 2. One upshot of nanosecond resolution is that we don't need to > > > "round up" to the next tick when scheduling a timeout to prevent > > > early execution. The extra resolution allows us to reduce > > > latency in some contexts. > > > > > > 3. Kclock timeouts cover the entire range of the kernel timeline. > > > We can remove the "tick loops" like the one sys_nanosleep(). > > > > > > 4. Kclock timeouts are scheduled as absolute deadlines. This makes > > > supporting absolute timeouts trivial, which means we can add support > > > for clock_nanosleep(2) and the absolute pthread timeouts to the > > > kernel. > > > > > > Kclock timeouts aren't actually used anywhere yet, so merging this > > > patch will not break anything like last time (CC bluhm@). > > > > > > In a subsequent diff I will put them to use in tsleep_nsec(9) etc. > > > This will enable those interfaces to block for less than a tick, which > > > in turn will allow userspace to block for less than a tick in e.g. > > > futex(2), and poll(2). pd@ has verified that this fixes the "time > > > problem" in OpenBSD vmm(4) VMs (CC pd@). > > > > Like I said above, running the timeout is still tick-driven isn't it? > > Yes, timeouts are still driven by the hardclock(9), which means your > maximum effective resolution is still 1/hz. > > > This avoids having to wait at least a tick for timeouts that are > > shorter than a tick, but it means the timeout can still be extended up > > to a full tick. > > Yes. Even if you schedule a timeout 1 nanosecond into the future you > must still wait for the hardclock(9) to fire and dump the timeout to > softclock() to run. > > > > You initialize kclock timeouts with timeout_set_kclock(). You > > > schedule them with timeout_in_nsec(), a relative timeout interface > > > that accepts a count of nanoseconds. If your timeout is in some > > > other unit (seconds, milliseconds, whatever) you must convert it > > > to nanoseconds before scheduling. Something like this will work: > > > > > > timeout_in_nsec(&my_timeout, SEC_TO_NSEC(1)); > > > > > > There won't be a flavored API supporting every conceivable time unit. > > > > So this is where I get worried. What is the game plan? Slowly > > convert everything from timeout_add_xxx() to timeout_in_nsec()? Or > > offer this as a temporary interface for people to test some critical > > subsystems after which we dump it and simply re-implement > > timeout_add_xxx() as kclock-based timeouts? > > I first want to put them to use in tsleep_nsec(9), msleep_nsec(9), and > rwsleep_nsec(9). These functions are basically never used in a hot > path so it is a low-risk change. > > After that, I'm not sure. > > One thing that concerns me about the reimplementation approach is that > there are lots of drivers without maintainers. Subtly changing the > way a bunch of code works without actually testing it sounds like a > recipe for disaster. > > > > In the future I will expose an absolute timeout interface and a > > > periodic timeout rescheduling interface. We don't need either of > > > these interfaces to start, though. > > > > Not sure how useful a periodic timeout rescheduling interface really > > is if you have an absolute timeout interface. And isn't > > timeout_at_ts() already implemented in the diff? > > You could do it by hand with timeout_at_ts(), but the dedicated > rescheduling interface will be much easier to use: > > - It automatically skips intervals that have already elapsed. > > - You don't need to track your last expiration time by hand. > The interface uses state kept in the timeout struct to > determine the start of the period. > > - It uses clever math to quickly find the next expiration time > instead of naively looping like we do in realitexpire(): > > while (timespeccmp(&abstime, &now, <=)) > timespecadd(&abstime, &period, &abstime); ok > > > Tick-based timeouts and kclock-based timeouts are *not* compatible. > > > You cannot schedule a kclock timeout with timeout_add(9). You cannot > > > schedule a tick-based timeout with timeout_in_nsec(9). I have added > > > KASSERTs to prevent this. > > > > > > Scheduling a kclock timeout with timeout_in_nsec() is more costly than > > > scheduling a tick-based timeout with timeout_add(9) because you have > > > to read the hardware timecounter. The cost will vary with your clock: > > > bad clocks have lots of overhead, good clocks have low-to-no overhead. > > > The programmer will need to decide if the potential overhead is too > > > high when employing these timeouts. In most cases the overhead will > > > not be a problem. The network stack is one spot where it might be. > > > > I doubt this will be a problem. For very small timeouts folks > > probably should use delay(9). > > > > > Processing the kclock timeout wheel during hardclock(9) adds > > > negligible overhead to that routine. > > > > > > Processing a kclock timeout during softclock() is roughly 4 times as > > > expensive as processing a tick-based timeout. At idle on my 2Ghz > > > amd64 machine tick-based timeouts take ~125 cycles to process while > > > kclock timeouts take ~500 cycles. The average cost seems to drop as > > > more kclock timeouts are processed, though I can't really explain why. > > > > Cache effects? Some if the overhead may be there because you keep > > track of "late" timeouts. But that code isn't really necessary is it? > > I was going to guess "cache effects" but they are black magic to me, > so your guess is as good as mine. > > As a small optimization we could move late timeout tracking into a > TIMEOUT_DEBUG #ifdef. This will probably be more useful on 32-bit > systems than anywhere else, though. > > > > Thoughts? ok? > > > > Some further nits below. > > Updated patch attached. The diff looks reasonable to me, but I'd like to discuss the path forward with some people during the hackathon next week. > Index: kern/kern_timeout.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_timeout.c,v > retrieving revision 1.79 > diff -u -p -r1.79 kern_timeout.c > --- kern/kern_timeout.c 7 Aug 2020 00:45:25 -0000 1.79 > +++ kern/kern_timeout.c 7 Sep 2020 23:46:44 -0000 > @@ -1,4 +1,4 @@ > -/* $OpenBSD: kern_timeout.c,v 1.79 2020/08/07 00:45:25 cheloha Exp $ > */ > +/* $OpenBSD: kern_timeout.c,v 1.77 2020/08/01 08:40:20 anton Exp $ */ > /* > * Copyright (c) 2001 Thomas Nordin <nor...@openbsd.org> > * Copyright (c) 2000-2001 Artur Grabowski <a...@openbsd.org> > @@ -64,16 +64,27 @@ struct timeoutstat tostat; /* [T] stati > * of the global variable "ticks" when the timeout should be called. There > are > * four levels with 256 buckets each. > */ > -#define BUCKETS 1024 > +#define WHEELCOUNT 4 > #define WHEELSIZE 256 > #define WHEELMASK 255 > #define WHEELBITS 8 > +#define BUCKETS (WHEELCOUNT * WHEELSIZE) > > -struct circq timeout_wheel[BUCKETS]; /* [T] Queues of timeouts */ > +struct circq timeout_wheel[BUCKETS]; /* [T] Tick-based timeouts */ > +struct circq timeout_wheel_kc[BUCKETS]; /* [T] Clock-based timeouts */ > struct circq timeout_new; /* [T] New, unscheduled timeouts */ > struct circq timeout_todo; /* [T] Due or needs rescheduling */ > struct circq timeout_proc; /* [T] Due + needs process context */ > > +time_t timeout_level_width[WHEELCOUNT]; /* [I] Wheel level width > (seconds) */ > +struct timespec tick_ts; /* [I] Length of a tick (1/hz secs) */ > + > +struct kclock { > + struct timespec kc_lastscan; /* [T] Clock time at last wheel scan */ > + struct timespec kc_late; /* [T] Late if due prior */ > + struct timespec kc_offset; /* [T] Offset from primary kclock */ > +} timeout_kclock[KCLOCK_MAX]; > + > #define MASKWHEEL(wheel, time) (((time) >> ((wheel)*WHEELBITS)) & WHEELMASK) > > #define BUCKET(rel, abs) \ > @@ -155,9 +166,15 @@ struct lock_type timeout_spinlock_type = > ((needsproc) ? &timeout_sleeplock_obj : &timeout_spinlock_obj) > #endif > > +void kclock_nanotime(int, struct timespec *); > void softclock(void *); > void softclock_create_thread(void *); > +void softclock_process_kclock_timeout(struct timeout *, int); > +void softclock_process_tick_timeout(struct timeout *, int); > void softclock_thread(void *); > +uint32_t timeout_bucket(struct timeout *); > +uint32_t timeout_maskwheel(uint32_t, const struct timespec *); > +void timeout_run(struct timeout *); > void timeout_proc_barrier(void *); > > /* > @@ -207,13 +224,19 @@ timeout_sync_leave(int needsproc) > void > timeout_startup(void) > { > - int b; > + int b, level; > > CIRCQ_INIT(&timeout_new); > CIRCQ_INIT(&timeout_todo); > CIRCQ_INIT(&timeout_proc); > for (b = 0; b < nitems(timeout_wheel); b++) > CIRCQ_INIT(&timeout_wheel[b]); > + for (b = 0; b < nitems(timeout_wheel_kc); b++) > + CIRCQ_INIT(&timeout_wheel_kc[b]); > + > + for (level = 0; level < nitems(timeout_level_width); level++) > + timeout_level_width[level] = 2 << (level * WHEELBITS); > + NSEC_TO_TIMESPEC(tick_nsec, &tick_ts); > } > > void > @@ -229,25 +252,39 @@ timeout_proc_init(void) > kthread_create_deferred(softclock_create_thread, NULL); > } > > +static inline void > +_timeout_set(struct timeout *to, void (*fn)(void *), void *arg, int flags, > + int kclock) > +{ > + to->to_func = fn; > + to->to_arg = arg; > + to->to_flags = flags | TIMEOUT_INITIALIZED; > + to->to_kclock = kclock; > +} > + > void > timeout_set(struct timeout *new, void (*fn)(void *), void *arg) > { > - timeout_set_flags(new, fn, arg, 0); > + _timeout_set(new, fn, arg, 0, KCLOCK_NONE); > } > > void > timeout_set_flags(struct timeout *to, void (*fn)(void *), void *arg, int > flags) > { > - to->to_func = fn; > - to->to_arg = arg; > - to->to_process = NULL; > - to->to_flags = flags | TIMEOUT_INITIALIZED; > + _timeout_set(to, fn, arg, flags, KCLOCK_NONE); > } > > void > timeout_set_proc(struct timeout *new, void (*fn)(void *), void *arg) > { > - timeout_set_flags(new, fn, arg, TIMEOUT_PROC); > + _timeout_set(new, fn, arg, TIMEOUT_PROC, KCLOCK_NONE); > +} > + > +void > +timeout_set_kclock(struct timeout *to, void (*fn)(void *), void *arg, > + int flags, int kclock) > +{ > + _timeout_set(to, fn, arg, flags | TIMEOUT_KCLOCK, kclock); > } > > int > @@ -257,6 +294,8 @@ timeout_add(struct timeout *new, int to_ > int ret = 1; > > KASSERT(ISSET(new->to_flags, TIMEOUT_INITIALIZED)); > + KASSERT(!ISSET(new->to_flags, TIMEOUT_KCLOCK)); > + KASSERT(new->to_kclock == KCLOCK_NONE); > KASSERT(to_ticks >= 0); > > mtx_enter(&timeout_mutex); > @@ -356,6 +395,65 @@ timeout_add_nsec(struct timeout *to, int > } > > int > +timeout_at_ts(struct timeout *to, const struct timespec *abstime) > +{ > + struct timespec old_abstime; > + int ret = 1; > + > + KASSERT(ISSET(to->to_flags, TIMEOUT_INITIALIZED | TIMEOUT_KCLOCK)); > + KASSERT(to->to_kclock != KCLOCK_NONE); > + > + mtx_enter(&timeout_mutex); > + > + old_abstime = to->to_abstime; > + to->to_abstime = *abstime; > + CLR(to->to_flags, TIMEOUT_TRIGGERED); > + > + if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) { > + if (timespeccmp(abstime, &old_abstime, <)) { > + CIRCQ_REMOVE(&to->to_list); > + CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list); > + } > + tostat.tos_readded++; > + ret = 0; > + } else { > + SET(to->to_flags, TIMEOUT_ONQUEUE); > + CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list); > + } > +#if NKCOV > 0 > + new->to_process = curproc->p_p; > +#endif > + tostat.tos_added++; > + mtx_leave(&timeout_mutex); > + > + return ret; > +} > + > +int > +timeout_in_nsec(struct timeout *to, uint64_t nsecs) > +{ > + struct timespec deadline, interval, now; > + > + kclock_nanotime(to->to_kclock, &now); > + NSEC_TO_TIMESPEC(nsecs, &interval); > + timespecadd(&now, &interval, &deadline); > + > + return timeout_at_ts(to, &deadline); > +} > + > +void > +kclock_nanotime(int kclock, struct timespec *now) > +{ > + switch (kclock) { > + case KCLOCK_UPTIME: > + nanouptime(now); > + break; > + default: > + panic("invalid kclock: 0x%x", kclock); > + } > +} > + > +int > timeout_del(struct timeout *to) > { > int ret = 0; > @@ -425,6 +523,47 @@ timeout_proc_barrier(void *arg) > cond_signal(c); > } > > +uint32_t > +timeout_bucket(struct timeout *to) > +{ > + struct kclock *kc = &timeout_kclock[to->to_kclock]; > + struct timespec diff; > + uint32_t level; > + > + KASSERT(ISSET(to->to_flags, TIMEOUT_KCLOCK)); > + KASSERT(timespeccmp(&kc->kc_lastscan, &to->to_abstime, <)); > + > + timespecsub(&to->to_abstime, &kc->kc_lastscan, &diff); > + for (level = 0; level < nitems(timeout_level_width) - 1; level++) { > + if (diff.tv_sec < timeout_level_width[level]) > + break; > + } > + return level * WHEELSIZE + timeout_maskwheel(level, &to->to_abstime); > +} > + > +/* > + * Hash the absolute time into a bucket on a given level of the wheel. > + * > + * The complete hash is 32 bits. The upper 25 bits are seconds, the > + * lower 7 bits are nanoseconds. tv_nsec is a positive value less > + * than one billion so we need to divide it to isolate the desired > + * bits. We can't just shift it. > + * > + * The level is used to isolate an 8-bit portion of the hash. The > + * resulting number indicates which bucket the absolute time belongs > + * to on the given level of the wheel. > + */ > +uint32_t > +timeout_maskwheel(uint32_t level, const struct timespec *abstime) > +{ > + uint32_t hi, lo; > + > + hi = abstime->tv_sec << 7; > + lo = abstime->tv_nsec / 7812500; > + > + return ((hi | lo) >> (level * WHEELBITS)) & WHEELMASK; > +} > + > /* > * This is called from hardclock() on the primary CPU at the start of > * every tick. > @@ -432,7 +571,15 @@ timeout_proc_barrier(void *arg) > void > timeout_hardclock_update(void) > { > - int need_softclock = 1; > + struct timespec elapsed, now; > + struct kclock *kc; > + struct timespec *lastscan; > + int b, done, first, i, last, level, need_softclock, off; > + > + kclock_nanotime(KCLOCK_UPTIME, &now); > + lastscan = &timeout_kclock[KCLOCK_UPTIME].kc_lastscan; > + timespecsub(&now, lastscan, &elapsed); > + need_softclock = 1; > > mtx_enter(&timeout_mutex); > > @@ -446,6 +593,44 @@ timeout_hardclock_update(void) > } > } > > + /* > + * Dump the buckets that expired while we were away. > + * > + * If the elapsed time has exceeded a level's limit then we need > + * to dump every bucket in the level. We have necessarily completed > + * a lap of that level, too, so we need to process buckets in the > + * next level. > + * > + * Otherwise we need to compare indices: if the index of the first > + * expired bucket is greater than that of the last then we have > + * completed a lap of the level and need to process buckets in the > + * next level. > + */ > + for (level = 0; level < nitems(timeout_level_width); level++) { > + first = timeout_maskwheel(level, lastscan); > + if (elapsed.tv_sec >= timeout_level_width[level]) { > + last = (first == 0) ? WHEELSIZE - 1 : first - 1; > + done = 0; > + } else { > + last = timeout_maskwheel(level, &now); > + done = first <= last; > + } > + off = level * WHEELSIZE; > + for (b = first;; b = (b + 1) % WHEELSIZE) { > + CIRCQ_CONCAT(&timeout_todo, &timeout_wheel_kc[off + b]); > + if (b == last) > + break; > + } > + if (done) > + break; > + } > + > + for (i = 0; i < nitems(timeout_kclock); i++) { > + kc = &timeout_kclock[i]; > + timespecadd(&now, &kc->kc_offset, &kc->kc_lastscan); > + timespecsub(&kc->kc_lastscan, &tick_ts, &kc->kc_late); > + } > + > if (CIRCQ_EMPTY(&timeout_new) && CIRCQ_EMPTY(&timeout_todo)) > need_softclock = 0; > > @@ -485,6 +670,51 @@ timeout_run(struct timeout *to) > mtx_enter(&timeout_mutex); > } > > +void > +softclock_process_kclock_timeout(struct timeout *to, int new) > +{ > + struct kclock *kc = &timeout_kclock[to->to_kclock]; > + > + if (timespeccmp(&to->to_abstime, &kc->kc_lastscan, >)) { > + tostat.tos_scheduled++; > + if (!new) > + tostat.tos_rescheduled++; > + CIRCQ_INSERT_TAIL(&timeout_wheel_kc[timeout_bucket(to)], > + &to->to_list); > + return; > + } > + if (!new && timespeccmp(&to->to_abstime, &kc->kc_late, <=)) > + tostat.tos_late++; > + if (ISSET(to->to_flags, TIMEOUT_PROC)) { > + CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list); > + return; > + } > + timeout_run(to); > + tostat.tos_run_softclock++; > +} > + > +void > +softclock_process_tick_timeout(struct timeout *to, int new) > +{ > + int delta = to->to_time - ticks; > + > + if (delta > 0) { > + tostat.tos_scheduled++; > + if (!new) > + tostat.tos_rescheduled++; > + CIRCQ_INSERT_TAIL(&BUCKET(delta, to->to_time), &to->to_list); > + return; > + } > + if (!new && delta < 0) > + tostat.tos_late++; > + if (ISSET(to->to_flags, TIMEOUT_PROC)) { > + CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list); > + return; > + } > + timeout_run(to); > + tostat.tos_run_softclock++; > +} > + > /* > * Timeouts are processed here instead of timeout_hardclock_update() > * to avoid doing any more work at IPL_CLOCK than absolutely necessary. > @@ -494,9 +724,8 @@ timeout_run(struct timeout *to) > void > softclock(void *arg) > { > - struct circq *bucket; > struct timeout *first_new, *to; > - int delta, needsproc, new; > + int needsproc, new; > > first_new = NULL; > new = 0; > @@ -510,28 +739,10 @@ softclock(void *arg) > CIRCQ_REMOVE(&to->to_list); > if (to == first_new) > new = 1; > - > - /* > - * If due run it or defer execution to the thread, > - * otherwise insert it into the right bucket. > - */ > - delta = to->to_time - ticks; > - if (delta > 0) { > - bucket = &BUCKET(delta, to->to_time); > - CIRCQ_INSERT_TAIL(bucket, &to->to_list); > - tostat.tos_scheduled++; > - if (!new) > - tostat.tos_rescheduled++; > - continue; > - } > - if (!new && delta < 0) > - tostat.tos_late++; > - if (ISSET(to->to_flags, TIMEOUT_PROC)) { > - CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list); > - continue; > - } > - timeout_run(to); > - tostat.tos_run_softclock++; > + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) > + softclock_process_kclock_timeout(to, new); > + else > + softclock_process_tick_timeout(to, new); > } > tostat.tos_softclocks++; > needsproc = !CIRCQ_EMPTY(&timeout_proc); > @@ -630,52 +841,114 @@ timeout_sysctl(void *oldp, size_t *oldle > } > > #ifdef DDB > +const char *db_kclock(int); > void db_show_callout_bucket(struct circq *); > +void db_show_timeout(struct timeout *, struct circq *); > +const char *db_timespec(const struct timespec *); > + > +const char * > +db_kclock(int kclock) > +{ > + switch (kclock) { > + case KCLOCK_UPTIME: > + return "uptime"; > + default: > + return "invalid"; > + } > +} > + > +const char * > +db_timespec(const struct timespec *ts) > +{ > + static char buf[32]; > + struct timespec tmp, zero; > + > + if (ts->tv_sec >= 0) { > + snprintf(buf, sizeof(buf), "%lld.%09ld", > + ts->tv_sec, ts->tv_nsec); > + return buf; > + } > + > + timespecclear(&zero); > + timespecsub(&zero, ts, &tmp); > + snprintf(buf, sizeof(buf), "-%lld.%09ld", tmp.tv_sec, tmp.tv_nsec); > + return buf; > +} > > void > db_show_callout_bucket(struct circq *bucket) > { > - char buf[8]; > - struct timeout *to; > struct circq *p; > + > + CIRCQ_FOREACH(p, bucket) > + db_show_timeout(timeout_from_circq(p), bucket); > +} > + > +void > +db_show_timeout(struct timeout *to, struct circq *bucket) > +{ > + struct timespec remaining; > + struct kclock *kc; > + char buf[8]; > db_expr_t offset; > + struct circq *wheel; > char *name, *where; > int width = sizeof(long) * 2; > > - CIRCQ_FOREACH(p, bucket) { > - to = timeout_from_circq(p); > - db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset); > - name = name ? name : "?"; > - if (bucket == &timeout_todo) > - where = "softint"; > - else if (bucket == &timeout_proc) > - where = "thread"; > - else if (bucket == &timeout_new) > - where = "new"; > - else { > - snprintf(buf, sizeof(buf), "%3ld/%1ld", > - (bucket - timeout_wheel) % WHEELSIZE, > - (bucket - timeout_wheel) / WHEELSIZE); > - where = buf; > - } > - db_printf("%9d %7s 0x%0*lx %s\n", > - to->to_time - ticks, where, width, (ulong)to->to_arg, name); > + db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset); > + name = name ? name : "?"; > + if (bucket == &timeout_new) > + where = "new"; > + else if (bucket == &timeout_todo) > + where = "softint"; > + else if (bucket == &timeout_proc) > + where = "thread"; > + else { > + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) > + wheel = timeout_wheel_kc; > + else > + wheel = timeout_wheel; > + snprintf(buf, sizeof(buf), "%3ld/%1ld", > + (bucket - wheel) % WHEELSIZE, > + (bucket - wheel) / WHEELSIZE); > + where = buf; > + } > + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) { > + kc = &timeout_kclock[to->to_kclock]; > + timespecsub(&to->to_abstime, &kc->kc_lastscan, &remaining); > + db_printf("%20s %8s %7s 0x%0*lx %s\n", > + db_timespec(&remaining), db_kclock(to->to_kclock), where, > + width, (ulong)to->to_arg, name); > + } else { > + db_printf("%20d %8s %7s 0x%0*lx %s\n", > + to->to_time - ticks, "ticks", where, > + width, (ulong)to->to_arg, name); > } > } > > void > db_show_callout(db_expr_t addr, int haddr, db_expr_t count, char *modif) > { > + struct kclock *kc; > int width = sizeof(long) * 2 + 2; > - int b; > - > - db_printf("ticks now: %d\n", ticks); > - db_printf("%9s %7s %*s func\n", "ticks", "wheel", width, "arg"); > + int b, i; > > + db_printf("%20s %8s\n", "lastscan", "clock"); > + db_printf("%20d %8s\n", ticks, "ticks"); > + for (i = 0; i < nitems(timeout_kclock); i++) { > + kc = &timeout_kclock[i]; > + db_printf("%20s %8s\n", > + db_timespec(&kc->kc_lastscan), db_kclock(i)); > + } > + db_printf("\n"); > + db_printf("%20s %8s %7s %*s %s\n", > + "remaining", "clock", "wheel", width, "arg", "func"); > db_show_callout_bucket(&timeout_new); > db_show_callout_bucket(&timeout_todo); > db_show_callout_bucket(&timeout_proc); > for (b = 0; b < nitems(timeout_wheel); b++) > db_show_callout_bucket(&timeout_wheel[b]); > + for (b = 0; b < nitems(timeout_wheel_kc); b++) > + db_show_callout_bucket(&timeout_wheel_kc[b]); > } > #endif > Index: sys/timeout.h > =================================================================== > RCS file: /cvs/src/sys/sys/timeout.h,v > retrieving revision 1.39 > diff -u -p -r1.39 timeout.h > --- sys/timeout.h 7 Aug 2020 00:45:25 -0000 1.39 > +++ sys/timeout.h 7 Sep 2020 23:46:44 -0000 > @@ -1,4 +1,4 @@ > -/* $OpenBSD: timeout.h,v 1.39 2020/08/07 00:45:25 cheloha Exp $ */ > +/* $OpenBSD: timeout.h,v 1.38 2020/08/01 08:40:20 anton Exp $ */ > /* > * Copyright (c) 2000-2001 Artur Grabowski <a...@openbsd.org> > * All rights reserved. > @@ -51,6 +51,8 @@ > * These functions may be called in interrupt context (anything below > splhigh). > */ > > +#include <sys/time.h> > + > struct circq { > struct circq *next; /* next element */ > struct circq *prev; /* previous element */ > @@ -58,13 +60,15 @@ struct circq { > > struct timeout { > struct circq to_list; /* timeout queue, don't move */ > + struct timespec to_abstime; /* absolute time to run at */ > void (*to_func)(void *); /* function to call */ > void *to_arg; /* function argument */ > - int to_time; /* ticks on event */ > - int to_flags; /* misc flags */ > #if 1 /* NKCOV > 0 */ > struct process *to_process; /* kcov identifier */ > #endif > + int to_time; /* ticks on event */ > + int to_flags; /* misc flags */ > + int to_kclock; /* abstime's kernel clock */ > }; > > /* > @@ -74,6 +78,7 @@ struct timeout { > #define TIMEOUT_ONQUEUE 0x02 /* on any timeout queue */ > #define TIMEOUT_INITIALIZED 0x04 /* initialized */ > #define TIMEOUT_TRIGGERED 0x08 /* running or ran */ > +#define TIMEOUT_KCLOCK 0x10 /* clock-based timeout */ > > struct timeoutstat { > uint64_t tos_added; /* timeout_add*(9) calls */ > @@ -103,25 +108,43 @@ int timeout_sysctl(void *, size_t *, voi > #define timeout_initialized(to) ((to)->to_flags & TIMEOUT_INITIALIZED) > #define timeout_triggered(to) ((to)->to_flags & TIMEOUT_TRIGGERED) > > -#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags) { \ > +#define KCLOCK_NONE (-1) /* dummy clock for sanity checks */ > +#define KCLOCK_UPTIME 0 /* uptime clock; time since > boot */ > +#define KCLOCK_MAX 1 > + > +#define __TIMEOUT_INITIALIZER(fn, arg, flags, kclock) { > \ > .to_list = { NULL, NULL }, \ > + .to_abstime = { .tv_sec = 0, .tv_nsec = 0 }, \ > .to_func = (fn), \ > .to_arg = (arg), \ > .to_time = 0, \ > - .to_flags = (flags) | TIMEOUT_INITIALIZED \ > + .to_flags = (flags) | TIMEOUT_INITIALIZED, \ > + .to_kclock = (kclock) \ > } > > -#define TIMEOUT_INITIALIZER(_f, _a) TIMEOUT_INITIALIZER_FLAGS((_f), (_a), 0) > +#define TIMEOUT_INITIALIZER_KCLOCK(fn, arg, flags, kclock) \ > + __TIMEOUT_INITIALIZER((fn), (args), (flags) | TIMEOUT_KCLOCK, (kclock)) > + > +#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags) \ > + __TIMEOUT_INITIALIZER((fn), (args), (flags), KCLOCK_NONE) > + > +#define TIMEOUT_INITIALIZER(_f, _a) \ > + __TIMEOUT_INITIALIZER((_f), (_a), 0, KCLOCK_NONE) > > void timeout_set(struct timeout *, void (*)(void *), void *); > void timeout_set_flags(struct timeout *, void (*)(void *), void *, int); > +void timeout_set_kclock(struct timeout *, void (*)(void *), void *, int, > int); > void timeout_set_proc(struct timeout *, void (*)(void *), void *); > + > int timeout_add(struct timeout *, int); > int timeout_add_tv(struct timeout *, const struct timeval *); > int timeout_add_sec(struct timeout *, int); > int timeout_add_msec(struct timeout *, int); > int timeout_add_usec(struct timeout *, int); > int timeout_add_nsec(struct timeout *, int); > + > +int timeout_in_nsec(struct timeout *, uint64_t); > + > int timeout_del(struct timeout *); > int timeout_del_barrier(struct timeout *); > void timeout_barrier(struct timeout *); >