This patch enables the pool cache feature on the knote pool to reduce the overhead of knote management.
Profiling done by mpi@ and bluhm@ indicate that the potentially needless allocation of knotes in kqueue_register() causes slowdown with kqueue-based poll(2) and select(2). One approach to fix this is to reverse the function's initial guess about knote: Try without allocation first. Then allocate and retry if the knote is missing from the kqueue and EV_ADD is given. Another option is to cache free knotes so that the shared knote pool would be accessed less frequently. The following diff takes the second approach. The caching is implemented simply by enabling the pool cache feature. This makes use of existing code and does not complicate kqueue_register(). The feature also helps if there is heavy knote churn. I think the most substantial part of the diff is that it extends pool cache usage beyond mbufs. Is this change acceptable? Note the cache is not particularly useful without kqueue-based poll(2) and select(2). The pool view of systat(1) shows that there are pools that would benefit more than knote_pool from caching, at least in terms of request frequencies. The relative frequencies are dependent on system workload, though. Kqpoll would definitely make knote pool more heavily used. Index: kern/init_main.c =================================================================== RCS file: src/sys/kern/init_main.c,v retrieving revision 1.306 diff -u -p -r1.306 init_main.c --- kern/init_main.c 8 Feb 2021 10:51:01 -0000 1.306 +++ kern/init_main.c 31 May 2021 16:50:17 -0000 @@ -71,6 +71,7 @@ #include <sys/msg.h> #endif #include <sys/domain.h> +#include <sys/event.h> #include <sys/msgbuf.h> #include <sys/mbuf.h> #include <sys/pipe.h> @@ -148,7 +149,6 @@ void crypto_init(void); void db_ctf_init(void); void prof_init(void); void init_exec(void); -void kqueue_init(void); void futex_init(void); void taskq_init(void); void timeout_proc_init(void); @@ -432,7 +432,9 @@ main(void *framep) prof_init(); #endif - mbcpuinit(); /* enable per cpu mbuf data */ + /* Enable per-CPU data. */ + mbcpuinit(); + kqueue_init_percpu(); uvm_init_percpu(); /* init exec and emul */ Index: kern/kern_event.c =================================================================== RCS file: src/sys/kern/kern_event.c,v retrieving revision 1.163 diff -u -p -r1.163 kern_event.c --- kern/kern_event.c 22 Apr 2021 15:30:12 -0000 1.163 +++ kern/kern_event.c 31 May 2021 16:50:17 -0000 @@ -231,6 +231,12 @@ kqueue_init(void) PR_WAITOK, "knotepl", NULL); } +void +kqueue_init_percpu(void) +{ + pool_cache_init(&knote_pool); +} + int filt_fileattach(struct knote *kn) { Index: sys/event.h =================================================================== RCS file: src/sys/sys/event.h,v retrieving revision 1.54 diff -u -p -r1.54 event.h --- sys/event.h 24 Feb 2021 14:59:52 -0000 1.54 +++ sys/event.h 31 May 2021 16:50:18 -0000 @@ -292,6 +292,8 @@ extern void knote_fdclose(struct proc *p extern void knote_processexit(struct proc *); extern void knote_modify(const struct kevent *, struct knote *); extern void knote_submit(struct knote *, struct kevent *); +extern void kqueue_init(void); +extern void kqueue_init_percpu(void); extern int kqueue_register(struct kqueue *kq, struct kevent *kev, struct proc *p); extern int kqueue_scan(struct kqueue_scan_state *, int, struct kevent *,