Here's a straw proposal for an MI API to allow a kthread to use any vector or floating-point unit on the CPU -- call it the `FPU' for brevity.
The MI concept of `the FPU' encompasses _all_ vector or floating-point units that userland threads would have access to, so we don't have to complicate it by distinguishing, e.g., the crypto registers from the floating-point registers on Cavium -- it's all or nothing. 1. New kthread flag KTHREAD_FPU. Any kthread created with this flag will have its FPU state saved and restored like a userland thread. The implementation would be a new lwp l_pflag, say LP_SYSTEM_FPU. MD FPU traps which currently panic on LP_SYSTEM lwps will panic only if LP_SYSTEM && !LP_SYSTEM_FPU. 2. New functions s = kthread_fpu_enter(); ... kthread_fpu_exit(s); During this time, it has the effect of the KTHREAD_FPU flag, and kthread_fpu_enter/exit nest. kthread_fpu_exit additionally zeroes the FPU registers to avoid leaking secrets through Spectre-class vulnerabilities in case an adversary can control speculative FPU execution before the next FPU-changing context switch. 3. New workqueue flag WQ_FPU passes KTHREAD_FPU to all the internal kthreads. Threadpools do not have any new flag -- they can use kthread_fpu_enter/exit in the job function, since different threadpool jobs by design share kthreads with one another. There may also be MD functions like x86 fpu_kern_enter to use the FPU with preemption disabled. They may be limited to a single type of FPU or vector unit, e.g. just Cavium crypto but not MIPS floating-point. These functions can avoid disabling preemption -- and avoind zeroing the FPU registers -- in FPU-enabled kthreads. That way, for example, you can use (say) an AES encryption routine aes_enc as a subroutine anywhere in the kernel, and an MD definition of aes_enc can internally use AES-NI with the appropriate MD fpu_kern_enter -- but it's a little cheaper to use aes_enc in an FPU-enabled kthread. This gave a modest measurable boost to cgd(4) throughput in my preliminary experiments. Thoughts?