On Sat, Aug 05, 2023 at 01:33:05AM -0400, A Tammy wrote:
>
> On 8/5/23 00:49, Scott Cheloha wrote:
> > On Sat, Aug 05, 2023 at 12:17:48AM -0400, aisha wrote:
> >> On 22/09/10 01:53PM, Visa Hankala wrote:
> >>> On Wed, Aug 31, 2022 at 04:48:37PM -0400, aisha wrote:
> >>>> I've added a patch which adds support for NOTE_{,U,M,N}SECONDS for
> >>>> EVFILT_TIMER in the kqueue interface.
> >>> It sort of makes sense to add an option to specify timeouts in
> >>> sub-millisecond precision. It feels complete overengineering to add
> >>> multiple time units on the level of the kernel interface. However,
> >>> it looks that FreeBSD and NetBSD have already done this following
> >>> macOS' lead...
> >>>
> >>>> I've also added the NOTE_ABSTIME but haven't done any actual
> >>>> implementation
> >>>> there as I am not sure how the `data` field should be interpreted (is it
> >>>> absolute time in seconds since epoch?).
> >>> I think FreeBSD and NetBSD take NOTE_ABSTIME as time since the epoch.
> >>>
> >>> Below is a revised patch that takes into account some corner cases.
> >>> It tries to be API-compatible with FreeBSD and NetBSD. I have adjusted
> >>> the NOTE_{,M,U,N}SECONDS flags so that they are enum-like.
> >>>
> >>> The manual page bits are from NetBSD.
> >>>
> >>> It is quite late to introduce a feature like this within this release
> >>> cycle. Until now, the timer code has ignored the fflags field. There
> >>> might be pieces of software that are careless with struct kevent and
> >>> that would break as a result of this patch. Programs that are widely
> >>> used on different BSDs are probably fine already, though.
> >>
> >> Sorry, I had forgotten this patch for a long time!!! I've been running
> >> with this for a while now and it's been working nicely.
> >
> > Where is this being used in ports? I think having "one of each" for
> > seconds, milliseconds, microseconds, and nanoseconds is (as visa
> > noted) way, way over-the-top.
>
> I was using it with a port that I sent out a while ago but never got
> into tree (was before I joined the project) -
> https://marc.info/?l=openbsd-ports&m=165715874509440&w=2
If nothing in ports is using this I am squeamish about adding it.
Once we add it, we're stuck maintaining it, warts and all.
If www/workflow were in the tree I could see the upside. Is it in
ports?
It looks like workflow actually wants timerfd(2) from Linux and is
simulating timerfd(2) with EVFILT_TIMER and NOTE_NSECONDS:
https://github.com/sogou/workflow/blob/80b3dfbad2264bcd79ba37811c66421490e337d2/src/kernel/poller.c#L227
I think timerfd(2) is the superior interface here. It keeps the POSIX
interval timer semantics without all the signal delivery baggage. It
also supports multiple clocks and starting a periodic timeout from an
absolute starting time.
So, if the goal is "add www/workflow to ports", adding timerfd(2) might
be the right thing.
> I also agree with it being over the top but that's the way it is in
> net/freebsd, I'm also fine with breaking compatibility and only keeping
> nano, no preferences either way.
Well, if we're going to add it (if), we should add all of it. The
vast majority of the code is not conversion code: if we add support
for NOTE_NSECONDS, adding support for the other units is trivial, and
there is value in being fully compatible with other implementations.
> > The original EVFILT_TIMER supported only milliseconds, yes. Given
> > that it debuted in the late 90s, I think that was a bad choice. But
> > when milliseconds were insufficiently precise, the obvious thing would
> > be to add support for nanoseconds... and then stop.
> >
> > The decision to use the UTC clock with no option to select a different
> > clockid_t for NOTE_ABSTIME is also unfortunate.
>
> Yes, furthermore this was very unclear as I couldn't find this in the
> man pages for either of net/freebsd.
>
> > Grumble.
> >
> >> I had an unrelated question inlined.
> >>
> >> [...]
> >>> static void
> >>> -filt_timer_timeout_add(struct knote *kn)
> >>> +filt_timeradd(struct knote *kn, struct timespec *ts)
> >>> {
> >>> - struct timeval tv;
> >>> + struct timespec expiry, now;
> >>> struct timeout *to = kn->kn_hook;
> >>> int tticks;
> >>>
> >>> - tv.tv_sec = kn->kn_sdata / 1000;
> >>> - tv.tv_usec = (kn->kn_sdata % 1000) * 1000;
> >>> - tticks = tvtohz(&tv);
> >>> - /* Remove extra tick from tvtohz() if timeout has fired before. */
> >>> + if (kn->kn_sfflags & NOTE_ABSTIME) {
> >>> + nanotime(&now);
> >>> + if (timespeccmp(ts, &now, >)) {
> >>> + timespecsub(ts, &now, &expiry);
> >>> + /* XXX timeout_at_ts */
> >>> + timeout_add(to, tstohz(&expiry));
> > visa:
> >
> > we should use timeout_abs_ts() here. I need to adjust it, though.
> >
> >>> + } else {
> >>> + /* Expire immediately. */
> >>> + filt_timerexpire(kn);
> >>> + }
> >>> + return;
> >>> + }
> >>> +
> >>> + tticks = tstohz(ts);
> >>> + /* Remove extra tick from tstohz() if timeout has fired before. */
> >>> if (timeout_triggered(to))
> >>> tticks--;
> >> I always wondered why one tick was removed, is one tick really
> >> that important? And does a timeout firing only cost one tick?
> > When you convert a duration to a count of ticks with tstohz(), it adds
> > an extra tick to the result to keep you from undershooting your
> > timeout. You start counting your timeout at the start of the *next*
> > tick, otherwise the timeout might fire early. However, after the
> > timeout has expired once, you no longer need the extra tick because
> > you can (more or less) assume that the timeout is running at the start
> > of the new tick.
> >
> > I know that sounds a little fuzzy, but in practice it works.
>
> Haha, these are the kind of weird idiosyncrasies that are fun to know
> about. Thank you very much for the explanation! :D
>
> So I went around looking at how large a tick really is and seems like we
> get it through kern.clockrate?? (from man tick)
>
> aisha@fwall ~ $ sysctl kern.clockrate
> kern.clockrate=tick = 10000, hz = 100, profhz = 1000, stathz = 100
>
> so presumably each tick is 1/10000 of a second (is this right?), [...]
kern.clockrate's "tick" member represents the number of microseconds
in a hardclock tick. It's just 1,000,000 / hz.
> and things are getting scheduled in terms of ticks, so how is it even
> possible to get nanosecond level accuracy there?
We have a nanosecond resolution timeout API, but it isn't super useful
yet because the timeout layer doesn't use the clock interrupt API. I
am hoping to add this in the next release cycle.
> From more looking around it seems like atleast x86 has TSC which
> provides better resolution (presumably similar things exist for other
> archs) but I don't see it being used anywhere here in an obvious
> fashion. man pctr doesn't mention it being used for time measurement.
Every practical OpenBSD platform has access to a nice clock.
Fixed-frequency, high resolution (1us or higher), and high precision
(reads are fast).
--
Here is a revised patch:
- Only validate inputs in filt_timervalidate(). Do the input conversion
in a separate routine, filt_timer_sdata_to_nsecs().
- Schedule the timeout in filt_timerstart(). Return zero if the absolute
time has already expired and the timeout was not scheduled. The caller
can then call filt_timerexpire().
This duplicates some code across filt_timerattach() and filt_timermodify(),
but I think it's a little less magical: filt_timerstart() does *one* thing
and leaves error handling to the caller.
- If the input isn't an absolute timeout we need to round sdata up
from 0 to 1. This is what FreeBSD does.
I think this is bad behavior. A periodic timeout of zero is meaningless.
The sensible thing would be to reject the input with EINVAL. But I didn't
design the API so that ship has sailed.
- Use the high resolution timeout API instead of the tick-based API.
In particular, we can use the UTC clock for absolute timeouts, just like
FreeBSD does.
- In filt_timerexpire(), use timeout_advance() to count any expirations we
missed due to processing delays.
The UTC timeout support in kern_timeout.c is a rough draft. There's a
lot going on in there. But if we included it we would be more compatible
with FreeBSD.
Index: sys/event.h
===================================================================
RCS file: /cvs/src/sys/sys/event.h,v
retrieving revision 1.69
diff -u -p -r1.69 event.h
--- sys/event.h 10 Feb 2023 14:34:17 -0000 1.69
+++ sys/event.h 8 Aug 2023 15:38:39 -0000
@@ -122,6 +122,13 @@ struct kevent {
/* data/hint flags for EVFILT_DEVICE, shared with userspace */
#define NOTE_CHANGE 0x00000001 /* device change event */
+/* additional flags for EVFILT_TIMER */
+#define NOTE_MSECONDS 0x00000000 /* data is milliseconds */
+#define NOTE_SECONDS 0x00000001 /* data is seconds */
+#define NOTE_USECONDS 0x00000002 /* data is microseconds */
+#define NOTE_NSECONDS 0x00000003 /* data is nanoseconds */
+#define NOTE_ABSTIME 0x00000010 /* timeout is absolute */
+
/*
* This is currently visible to userland to work around broken
* programs which pull in <sys/proc.h> or <sys/selinfo.h>.
Index: kern/kern_event.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_event.c,v
retrieving revision 1.196
diff -u -p -r1.196 kern_event.c
--- kern/kern_event.c 11 Apr 2023 00:45:09 -0000 1.196
+++ kern/kern_event.c 8 Aug 2023 15:38:39 -0000
@@ -449,55 +449,127 @@ filt_proc(struct knote *kn, long hint)
return (kn->kn_fflags != 0);
}
-static void
-filt_timer_timeout_add(struct knote *kn)
+#define NOTE_TIMER_UNITMASK \
+ (NOTE_SECONDS | NOTE_MSECONDS | NOTE_USECONDS | NOTE_NSECONDS)
+
+static int
+filt_timervalidate(int flags, int64_t sdata)
+{
+ if (flags & ~(NOTE_TIMER_UNITMASK | NOTE_ABSTIME))
+ return (EINVAL);
+
+ switch (flags & NOTE_TIMER_UNITMASK) {
+ case NOTE_SECONDS:
+ case NOTE_MSECONDS:
+ case NOTE_USECONDS:
+ case NOTE_NSECONDS:
+ break;
+ default:
+ return (EINVAL);
+ }
+
+ if (sdata < 0)
+ return (EINVAL);
+
+ return (0);
+}
+
+static uint64_t
+filt_timer_sdata_to_nsecs(const struct knote *kn)
+{
+ int unit = kn->kn_sfflags & NOTE_TIMER_UNITMASK;
+
+ switch (unit) {
+ case NOTE_SECONDS:
+ return SEC_TO_NSEC(kn->kn_sdata);
+ case NOTE_MSECONDS:
+ return MSEC_TO_NSEC(kn->kn_sdata);
+ case NOTE_USECONDS:
+ return USEC_TO_NSEC(kn->kn_sdata);
+ case NOTE_NSECONDS:
+ return kn->kn_sdata;
+ default:
+ panic("%s: invalid EVFILT_TIMER unit: %d", __func__, unit);
+ }
+}
+
+/*
+ * Attempt to schedule the timeout. Returns zero if the timeout is
+ * not scheduled because the absolute time has already expired.
+ */
+static int
+filt_timerstart(struct knote *kn)
{
- struct timeval tv;
+ struct timespec expiry, now, timeout;
struct timeout *to = kn->kn_hook;
- int tticks;
- tv.tv_sec = kn->kn_sdata / 1000;
- tv.tv_usec = (kn->kn_sdata % 1000) * 1000;
- tticks = tvtohz(&tv);
- /* Remove extra tick from tvtohz() if timeout has fired before. */
- if (timeout_triggered(to))
- tticks--;
- timeout_add(to, (tticks > 0) ? tticks : 1);
+ NSEC_TO_TIMESPEC(filt_timer_sdata_to_nsecs(kn), &timeout);
+ if (kn->kn_sfflags & NOTE_ABSTIME) {
+ nanotime(&now);
+ if (timespeccmp(&timeout, &now, <=))
+ return 0;
+ expiry = timeout;
+ timeout_set_flags(to, filt_timerexpire, kn, KCLOCK_UTC, 0);
+ } else {
+ nanouptime(&now);
+ timespecadd(&now, &timeout, &expiry);
+ timeout_set_flags(to, filt_timerexpire, kn, KCLOCK_UPTIME, 0);
+ }
+ timeout_abs_ts(to, &expiry);
+ return 1;
}
void
filt_timerexpire(void *knx)
{
+ uint64_t count;
struct knote *kn = knx;
struct kqueue *kq = kn->kn_kq;
+ struct timeout *to = kn->kn_hook;
- kn->kn_data++;
+ /*
+ * One-shot timers and absolute timers expire only once.
+ * Periodic timers, on the other hand, may expire faster
+ * than we can service them. timeout_advance() reschedules
+ * a periodic timer while computing how many times the timer
+ * expired.
+ */
+ if ((kn->kn_flags & EV_ONESHOT) || (kn->kn_sfflags & NOTE_ABSTIME))
+ count = 1;
+ else
+ timeout_advance(to, filt_timer_sdata_to_nsecs(kn), &count);
+ kn->kn_data += count;
mtx_enter(&kq->kq_lock);
knote_activate(kn);
mtx_leave(&kq->kq_lock);
-
- if ((kn->kn_flags & EV_ONESHOT) == 0)
- filt_timer_timeout_add(kn);
}
-
/*
- * data contains amount of time to sleep, in milliseconds
+ * data contains a timeout. fflags clarifies what the timeout means.
*/
int
filt_timerattach(struct knote *kn)
{
struct timeout *to;
+ int error;
+
+ error = filt_timervalidate(kn->kn_sfflags, kn->kn_sdata);
+ if (error != 0)
+ return (error);
if (kq_ntimeouts > kq_timeoutmax)
return (ENOMEM);
kq_ntimeouts++;
- kn->kn_flags |= EV_CLEAR; /* automatically set */
- to = malloc(sizeof(*to), M_KEVENT, M_WAITOK);
- timeout_set(to, filt_timerexpire, kn);
+ if ((kn->kn_sfflags & NOTE_ABSTIME) == 0) {
+ kn->kn_flags |= EV_CLEAR; /* automatically set */
+ if (kn->kn_sdata == 0)
+ kn->kn_sdata = 1;
+ }
+ to = malloc(sizeof(*to), M_KEVENT, M_WAITOK | M_ZERO);
kn->kn_hook = to;
- filt_timer_timeout_add(kn);
+ if (!filt_timerstart(kn))
+ filt_timerexpire(kn);
return (0);
}
@@ -505,11 +577,11 @@ filt_timerattach(struct knote *kn)
void
filt_timerdetach(struct knote *kn)
{
- struct timeout *to;
+ struct timeout *to = kn->kn_hook;
- to = (struct timeout *)kn->kn_hook;
timeout_del_barrier(to);
free(to, M_KEVENT, sizeof(*to));
+ kn->kn_hook = NULL;
kq_ntimeouts--;
}
@@ -518,6 +590,14 @@ filt_timermodify(struct kevent *kev, str
{
struct kqueue *kq = kn->kn_kq;
struct timeout *to = kn->kn_hook;
+ int error;
+
+ error = filt_timervalidate(kev->fflags, kev->data);
+ if (error != 0) {
+ kev->flags |= EV_ERROR;
+ kev->data = error;
+ return (0);
+ }
/* Reset the timer. Any pending events are discarded. */
@@ -531,9 +611,13 @@ filt_timermodify(struct kevent *kev, str
kn->kn_data = 0;
knote_assign(kev, kn);
- /* Reinit timeout to invoke tick adjustment again. */
- timeout_set(to, filt_timerexpire, kn);
- filt_timer_timeout_add(kn);
+ if ((kn->kn_sfflags & NOTE_ABSTIME) == 0) {
+ kn->kn_flags |= EV_CLEAR; /* automatically set */
+ if (kn->kn_sdata == 0)
+ kn->kn_sdata = 1;
+ }
+ if (!filt_timerstart(kn))
+ filt_timerexpire(kn);
return (0);
}
@@ -551,7 +635,6 @@ filt_timerprocess(struct knote *kn, stru
return (active);
}
-
/*
* filt_seltrue:
Index: sys/timeout.h
===================================================================
RCS file: /cvs/src/sys/sys/timeout.h,v
retrieving revision 1.47
diff -u -p -r1.47 timeout.h
--- sys/timeout.h 31 Dec 2022 16:06:24 -0000 1.47
+++ sys/timeout.h 8 Aug 2023 15:38:39 -0000
@@ -27,6 +27,7 @@
#ifndef _SYS_TIMEOUT_H_
#define _SYS_TIMEOUT_H_
+#include <sys/queue.h>
#include <sys/time.h>
struct circq {
@@ -36,6 +37,7 @@ struct circq {
struct timeout {
struct circq to_list; /* timeout queue, don't move */
+ TAILQ_ENTRY(timeout) to_utc_link; /* UTC queue link */
struct timespec to_abstime; /* absolute time to run at */
void (*to_func)(void *); /* function to call */
void *to_arg; /* function argument */
@@ -85,10 +87,12 @@ int timeout_sysctl(void *, size_t *, voi
#define KCLOCK_NONE (-1) /* dummy clock for sanity checks */
#define KCLOCK_UPTIME 0 /* uptime clock; time since boot */
-#define KCLOCK_MAX 1
+#define KCLOCK_UTC 1 /* UTC clock; time since unix epoch */
+#define KCLOCK_MAX 2
#define TIMEOUT_INITIALIZER_FLAGS(_fn, _arg, _kclock, _flags) {
\
.to_list = { NULL, NULL }, \
+ .to_utc_link = { NULL, NULL }, \
.to_abstime = { .tv_sec = 0, .tv_nsec = 0 }, \
.to_func = (_fn), \
.to_arg = (_arg), \
@@ -112,6 +116,7 @@ int timeout_add_usec(struct timeout *, i
int timeout_add_nsec(struct timeout *, int);
int timeout_abs_ts(struct timeout *, const struct timespec *);
+int timeout_advance(struct timeout *, uint64_t, uint64_t *);
int timeout_del(struct timeout *);
int timeout_del_barrier(struct timeout *);
@@ -119,6 +124,7 @@ void timeout_barrier(struct timeout *);
void timeout_adjust_ticks(int);
void timeout_hardclock_update(void);
+void timeout_reset_kclock_offset(int, const struct timespec *);
void timeout_startup(void);
#endif /* _KERNEL */
Index: kern/kern_timeout.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_timeout.c,v
retrieving revision 1.95
diff -u -p -r1.95 kern_timeout.c
--- kern/kern_timeout.c 29 Jul 2023 06:52:08 -0000 1.95
+++ kern/kern_timeout.c 8 Aug 2023 15:38:39 -0000
@@ -75,6 +75,7 @@ struct circq timeout_wheel_kc[BUCKETS];
struct circq timeout_new; /* [T] New, unscheduled timeouts */
struct circq timeout_todo; /* [T] Due or needs rescheduling */
struct circq timeout_proc; /* [T] Due + needs process context */
+TAILQ_HEAD(, timeout) timeout_utc; /* [T] UTC-based timeouts */
time_t timeout_level_width[WHEELCOUNT]; /* [I] Wheel level width
(seconds) */
struct timespec tick_ts; /* [I] Length of a tick (1/hz secs) */
@@ -166,15 +167,22 @@ struct lock_type timeout_spinlock_type =
((needsproc) ? &timeout_sleeplock_obj : &timeout_spinlock_obj)
#endif
+void kclock_nanotime(int, struct timespec *);
void softclock(void *);
void softclock_create_thread(void *);
void softclock_process_kclock_timeout(struct timeout *, int);
void softclock_process_tick_timeout(struct timeout *, int);
void softclock_thread(void *);
+int timeout_abs_ts_locked(struct timeout *, const struct timespec *);
void timeout_barrier_timeout(void *);
uint32_t timeout_bucket(const struct timeout *);
+void timeout_dequeue(struct timeout *);
+void timeout_enqueue(struct circq *, struct timeout *);
uint32_t timeout_maskwheel(uint32_t, const struct timespec *);
void timeout_run(struct timeout *);
+uint64_t timespec_advance_nsec(struct timespec *, uint64_t,
+ const struct timespec *);
+void u64_sat_add(uint64_t *, uint64_t, uint64_t);
/*
* The first thing in a struct timeout is its struct circq, so we
@@ -228,6 +236,7 @@ timeout_startup(void)
CIRCQ_INIT(&timeout_new);
CIRCQ_INIT(&timeout_todo);
CIRCQ_INIT(&timeout_proc);
+ TAILQ_INIT(&timeout_utc);
for (b = 0; b < nitems(timeout_wheel); b++)
CIRCQ_INIT(&timeout_wheel[b]);
for (b = 0; b < nitems(timeout_wheel_kc); b++)
@@ -252,6 +261,25 @@ timeout_proc_init(void)
}
void
+timeout_reset_kclock_offset(int kclock, const struct timespec *offset)
+{
+ struct kclock *kc = &timeout_kclock[kclock];
+ struct timeout *to;
+
+ KASSERT(kclock == KCLOCK_UTC);
+
+ mtx_enter(&timeout_mutex);
+ if (kclock == KCLOCK_UTC && timespeccmp(&kc->kc_offset, offset, <)) {
+ TAILQ_FOREACH(to, &timeout_utc, to_utc_link) {
+ CIRCQ_REMOVE(&to->to_list);
+ CIRCQ_INSERT_TAIL(&timeout_todo, &to->to_list);
+ }
+ }
+ kc->kc_offset = *offset;
+ mtx_leave(&timeout_mutex);
+}
+
+void
timeout_set(struct timeout *new, void (*fn)(void *), void *arg)
{
timeout_set_flags(new, fn, arg, KCLOCK_NONE, 0);
@@ -273,6 +301,28 @@ timeout_set_proc(struct timeout *new, vo
timeout_set_flags(new, fn, arg, KCLOCK_NONE, TIMEOUT_PROC);
}
+void
+timeout_dequeue(struct timeout *to)
+{
+ KASSERT(ISSET(to->to_flags, TIMEOUT_ONQUEUE));
+
+ CIRCQ_REMOVE(&to->to_list);
+ if (to->to_kclock == KCLOCK_UTC)
+ TAILQ_REMOVE(&timeout_utc, to, to_utc_link);
+ CLR(to->to_flags, TIMEOUT_ONQUEUE);
+}
+
+void
+timeout_enqueue(struct circq *queue, struct timeout *to)
+{
+ KASSERT(!ISSET(to->to_flags, TIMEOUT_ONQUEUE));
+
+ CIRCQ_INSERT_TAIL(queue, &to->to_list);
+ if (to->to_kclock == KCLOCK_UTC)
+ TAILQ_INSERT_TAIL(&timeout_utc, to, to_utc_link);
+ SET(to->to_flags, TIMEOUT_ONQUEUE);
+}
+
int
timeout_add(struct timeout *new, int to_ticks)
{
@@ -297,14 +347,13 @@ timeout_add(struct timeout *new, int to_
*/
if (ISSET(new->to_flags, TIMEOUT_ONQUEUE)) {
if (new->to_time - ticks < old_time - ticks) {
- CIRCQ_REMOVE(&new->to_list);
- CIRCQ_INSERT_TAIL(&timeout_new, &new->to_list);
+ timeout_dequeue(new);
+ timeout_enqueue(&timeout_new, new);
}
tostat.tos_readded++;
ret = 0;
} else {
- SET(new->to_flags, TIMEOUT_ONQUEUE);
- CIRCQ_INSERT_TAIL(&timeout_new, &new->to_list);
+ timeout_enqueue(&timeout_new, new);
}
#if NKCOV > 0
if (!kcov_cold)
@@ -383,13 +432,23 @@ timeout_add_nsec(struct timeout *to, int
int
timeout_abs_ts(struct timeout *to, const struct timespec *abstime)
{
- struct timespec old_abstime;
- int ret = 1;
+ int status;
mtx_enter(&timeout_mutex);
+ status = timeout_abs_ts_locked(to, abstime);
+ mtx_leave(&timeout_mutex);
+ return status;
+}
+
+int
+timeout_abs_ts_locked(struct timeout *to, const struct timespec *abstime)
+{
+ struct timespec old_abstime;
+ int ret = 1;
+ MUTEX_ASSERT_LOCKED(&timeout_mutex);
KASSERT(ISSET(to->to_flags, TIMEOUT_INITIALIZED));
- KASSERT(to->to_kclock != KCLOCK_NONE);
+ KASSERT(to->to_kclock > KCLOCK_NONE && to->to_kclock < KCLOCK_MAX);
old_abstime = to->to_abstime;
to->to_abstime = *abstime;
@@ -397,14 +456,13 @@ timeout_abs_ts(struct timeout *to, const
if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) {
if (timespeccmp(abstime, &old_abstime, <)) {
- CIRCQ_REMOVE(&to->to_list);
- CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list);
+ timeout_dequeue(to);
+ timeout_enqueue(&timeout_new, to);
}
tostat.tos_readded++;
ret = 0;
} else {
- SET(to->to_flags, TIMEOUT_ONQUEUE);
- CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list);
+ timeout_enqueue(&timeout_new, to);
}
#if NKCOV > 0
if (!kcov_cold)
@@ -412,9 +470,26 @@ timeout_abs_ts(struct timeout *to, const
#endif
tostat.tos_added++;
+ return ret;
+}
+
+int
+timeout_advance(struct timeout *to, uint64_t intvl, uint64_t *ocount)
+{
+ struct timespec next, now;
+ uint64_t count;
+ int status;
+
+ mtx_enter(&timeout_mutex);
+ kclock_nanotime(to->to_kclock, &now);
+ next = to->to_abstime;
+ count = timespec_advance_nsec(&next, intvl, &now);
+ status = timeout_abs_ts_locked(to, &next);
mtx_leave(&timeout_mutex);
- return ret;
+ if (ocount != NULL)
+ *ocount = count;
+ return status;
}
int
@@ -424,8 +499,7 @@ timeout_del(struct timeout *to)
mtx_enter(&timeout_mutex);
if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) {
- CIRCQ_REMOVE(&to->to_list);
- CLR(to->to_flags, TIMEOUT_ONQUEUE);
+ timeout_dequeue(to);
tostat.tos_cancelled++;
ret = 1;
}
@@ -468,11 +542,10 @@ timeout_barrier(struct timeout *to)
mtx_enter(&timeout_mutex);
barrier.to_time = ticks;
- SET(barrier.to_flags, TIMEOUT_ONQUEUE);
if (procflag)
- CIRCQ_INSERT_TAIL(&timeout_proc, &barrier.to_list);
+ timeout_enqueue(&timeout_proc, &barrier);
else
- CIRCQ_INSERT_TAIL(&timeout_todo, &barrier.to_list);
+ timeout_enqueue(&timeout_todo, &barrier);
mtx_leave(&timeout_mutex);
@@ -496,19 +569,18 @@ uint32_t
timeout_bucket(const struct timeout *to)
{
struct timespec diff, shifted_abstime;
- struct kclock *kc;
+ struct kclock *kc = &timeout_kclock[to->to_kclock];
uint32_t level;
- KASSERT(to->to_kclock == KCLOCK_UPTIME);
- kc = &timeout_kclock[to->to_kclock];
-
+ KASSERT(to->to_kclock > KCLOCK_NONE && to->to_kclock < KCLOCK_MAX);
KASSERT(timespeccmp(&kc->kc_lastscan, &to->to_abstime, <));
+
timespecsub(&to->to_abstime, &kc->kc_lastscan, &diff);
for (level = 0; level < nitems(timeout_level_width) - 1; level++) {
if (diff.tv_sec < timeout_level_width[level])
break;
}
- timespecadd(&to->to_abstime, &kc->kc_offset, &shifted_abstime);
+ timespecsub(&to->to_abstime, &kc->kc_offset, &shifted_abstime);
return level * WHEELSIZE + timeout_maskwheel(level, &shifted_abstime);
}
@@ -620,7 +692,6 @@ timeout_run(struct timeout *to)
MUTEX_ASSERT_LOCKED(&timeout_mutex);
- CLR(to->to_flags, TIMEOUT_ONQUEUE);
SET(to->to_flags, TIMEOUT_TRIGGERED);
fn = to->to_func;
@@ -652,14 +723,13 @@ softclock_process_kclock_timeout(struct
tostat.tos_scheduled++;
if (!new)
tostat.tos_rescheduled++;
- CIRCQ_INSERT_TAIL(&timeout_wheel_kc[timeout_bucket(to)],
- &to->to_list);
+ timeout_enqueue(&timeout_wheel_kc[timeout_bucket(to)], to);
return;
}
if (!new && timespeccmp(&to->to_abstime, &kc->kc_late, <=))
tostat.tos_late++;
if (ISSET(to->to_flags, TIMEOUT_PROC)) {
- CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
+ timeout_enqueue(&timeout_proc, to);
return;
}
timeout_run(to);
@@ -675,13 +745,13 @@ softclock_process_tick_timeout(struct ti
tostat.tos_scheduled++;
if (!new)
tostat.tos_rescheduled++;
- CIRCQ_INSERT_TAIL(&BUCKET(delta, to->to_time), &to->to_list);
+ timeout_enqueue(&BUCKET(delta, to->to_time), to);
return;
}
if (!new && delta < 0)
tostat.tos_late++;
if (ISSET(to->to_flags, TIMEOUT_PROC)) {
- CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list);
+ timeout_enqueue(&timeout_proc, to);
return;
}
timeout_run(to);
@@ -697,11 +767,8 @@ softclock_process_tick_timeout(struct ti
void
softclock(void *arg)
{
- struct timeout *first_new, *to;
- int needsproc, new;
-
- first_new = NULL;
- new = 0;
+ struct timeout *first_new = NULL, *to;
+ int needsproc, new = 0;
mtx_enter(&timeout_mutex);
if (!CIRCQ_EMPTY(&timeout_new))
@@ -709,7 +776,7 @@ softclock(void *arg)
CIRCQ_CONCAT(&timeout_todo, &timeout_new);
while (!CIRCQ_EMPTY(&timeout_todo)) {
to = timeout_from_circq(CIRCQ_FIRST(&timeout_todo));
- CIRCQ_REMOVE(&to->to_list);
+ timeout_dequeue(to);
if (to == first_new)
new = 1;
if (to->to_kclock != KCLOCK_NONE)
@@ -758,7 +825,7 @@ softclock_thread(void *arg)
mtx_enter(&timeout_mutex);
while (!CIRCQ_EMPTY(&timeout_proc)) {
to = timeout_from_circq(CIRCQ_FIRST(&timeout_proc));
- CIRCQ_REMOVE(&to->to_list);
+ timeout_dequeue(to);
timeout_run(to);
tostat.tos_run_thread++;
}
@@ -768,6 +835,108 @@ softclock_thread(void *arg)
splx(s);
}
+void
+kclock_nanotime(int kclock, struct timespec *now)
+{
+ switch (kclock) {
+ case KCLOCK_UPTIME:
+ nanouptime(now);
+ return;
+ case KCLOCK_UTC:
+ nanotime(now);
+ return;
+ default:
+ panic("%s: invalid kclock: %d", __func__, kclock);
+ }
+}
+
+void
+u64_sat_add(uint64_t *sum, uint64_t a, uint64_t b)
+{
+ if (a + b < a)
+ *sum = UINT64_MAX;
+ else
+ *sum = a + b;
+}
+
+/*
+ * Given an interval timer with a period of invtl that last expired
+ * at absolute time abs, find the timer's next expiration time and
+ * write it back to abs. If abs has not yet expired, abs is not
+ * modified.
+ *
+ * Returns the number of intervals that have elapsed. If the number
+ * of elapsed intervals would overflow a 64-bit integer, UINT64_MAX is
+ * returned. Note that abs marks the end of the first interval: if abs
+ * has not expired, zero intervals have elapsed.
+ */
+uint64_t
+timespec_advance_nsec(struct timespec *abs, uint64_t intvl,
+ const struct timespec *now)
+{
+ struct timespec base, diff, minbase, next, intvl_product;
+ struct timespec intvl_product_max, intvl_ts;
+ uint64_t count = 0, quo;
+
+ /* Unusual case: abs has not expired, no intervals have elapsed. */
+ if (timespeccmp(now, abs, <)) {
+ if (intvl == 0)
+ panic("%s: intvl is zero", __func__);
+ return 0;
+ }
+
+ /* Typical case: abs has expired and only one interval has elapsed. */
+ NSEC_TO_TIMESPEC(intvl, &intvl_ts);
+ timespecadd(abs, &intvl_ts, &next);
+ if (timespeccmp(now, &next, <)) {
+ *abs = next;
+ return 1;
+ }
+
+ /*
+ * Annoying case: two or more intervals have elapsed.
+ *
+ * Find a base within interval-product range of the current time.
+ * Under normal circumstances abs will already be within range,
+ * but for sake of correctness we handle cases where enormous
+ * expanses of time have passed between abs and now.
+ */
+ quo = UINT64_MAX / intvl;
+ NSEC_TO_TIMESPEC(quo * intvl, &intvl_product_max);
+ timespecsub(now, &intvl_product_max, &minbase);
+ base = *abs;
+ if (__predict_false(timespeccmp(&base, &minbase, <))) {
+ while (timespeccmp(&base, &minbase, <)) {
+ timespecadd(&base, &intvl_product_max, &base);
+ u64_sat_add(&count, count, quo);
+ }
+ }
+
+ /*
+ * We have a base within range. Now find the interval-product
+ * that, when added to the base, gets us just past the current time
+ * to the most imminent expiration point.
+ *
+ * If the product would overflow a 64-bit integer we advance the
+ * base by one interval and retry. This can happen at most once.
+ *
+ * The next expiration is then the sum of the base and the
+ * interval-product.
+ */
+ for (;;) {
+ timespecsub(now, &base, &diff);
+ quo = TIMESPEC_TO_NSEC(&diff) / intvl;
+ if (__predict_true(intvl * quo <= UINT64_MAX - intvl))
+ break;
+ timespecadd(&base, &intvl_ts, &base);
+ u64_sat_add(&count, count, quo);
+ }
+ NSEC_TO_TIMESPEC(intvl * (quo + 1), &intvl_product);
+ timespecadd(&base, &intvl_product, abs);
+ u64_sat_add(&count, count, quo + 1);
+ return count;
+}
+
#ifndef SMALL_KERNEL
void
timeout_adjust_ticks(int adj)
@@ -791,8 +960,8 @@ timeout_adjust_ticks(int adj)
/* when moving a timeout forward need to reinsert it */
if (to->to_time - ticks < adj)
to->to_time = new_ticks;
- CIRCQ_REMOVE(&to->to_list);
- CIRCQ_INSERT_TAIL(&timeout_todo, &to->to_list);
+ timeout_dequeue(to);
+ timeout_enqueue(&timeout_todo, to);
}
}
ticks = new_ticks;
@@ -824,6 +993,8 @@ db_kclock(int kclock)
switch (kclock) {
case KCLOCK_UPTIME:
return "uptime";
+ case KCLOCK_UTC:
+ return "utc";
default:
return "invalid";
}