> Date: Tue, 16 Jun 2020 11:20:05 +1000
> From: David Gwynne <[email protected]>
>
> there's been discussions for years (and even some diffs!) about how we
> should let drivers establish interrupts on multiple cpus.
>
> the simple approach is to let every driver look at the number of cpus in
> a box and just pin an interrupt on it, which is what pretty much
> everyone else started with, but we have never seemed to get past
> bikeshedding about. from what i can tell, the principal objections to
> this are:
>
> 1. interrupts will tend to land on low numbered cpus.
>
> ie, if drivers try to establish n interrupts on m cpus, they'll
> start at cpu 0 and go to cpu n, which means cpu 0 will end up with more
> interrupts than cpu m-1. apparently this is terrible, even though
> currently we have all the interrupts on cpu0 anyway and the world
> hasnt ended.
>
> 2. some cpus shouldn't be used for interrupts.
>
> why a cpu should or shouldn't be used for interrupts can be pretty
> arbitrary, but in practical terms i'm going to borrow from the scheduler
> and say that we shouldn't run work on hyperthreads. discussions about
> big.little configs and so on can wait.
When hw.smt=0 we should defenotely not have interrupts on the parked
CPUs.
> 3. making all the drivers make the same decisions about the above is
> a lot of maintenance overhead.
>
> either we will have a bunch of inconsistencies, or we'll have a lot
> of untested commits to keep everything the same.
>
> my proposed solution to the above is this diff to provide the intrmap
> api. drivers that want to establish multiple interrupts ask the api for
> a set of cpus it can use, and the api considers the above issues when
> generating a set of cpus for the driver to use. drivers then establish
> interrupts on cpus with the info provided by the map.
>
> it is based on the if_ringmap api in dragonflybsd, but generalised so it
> could be used by something like nvme(4) in the future.
>
> jmatthew@ and i have been working on implementing a
> pci_intr_establish_cpu() api on a few archs, and tweaking some drivers
> to see if it works out well, and so far the conclusion is "yes, yes it
> does".
>
> the best example so far is vmx hacked up to establish interrupts on
> multiple cps using the api. i changed the interrupt name string so it
> includes the ring and cpuid it is establishing on. each vmx is also
> limited to 8 rings overall, no matter how big the system is.
>
> on a machine with 2 vmx interfaces, 16 "real" CPUs, and no hyperthreads,
> the mappings look like this:
>
> dlg@kbuild ~$ vmstat -zi | grep vmx
> irq114/vmx0 0 0
> irq115/vmx0:0:0 207 5
> irq116/vmx0:1:15 22 0
> irq117/vmx0:2:14 11 0
> irq118/vmx0:3:13 24 0
> irq119/vmx0:4:12 39 0
> irq120/vmx0:5:11 1 0
> irq121/vmx0:6:10 12 0
> irq122/vmx0:7:9 7 0
> irq126/vmx1 0 0
> irq127/vmx1:0:8 0 0
> irq128/vmx1:1:7 0 0
> irq129/vmx1:2:6 0 0
> irq130/vmx1:3:5 0 0
> irq131/vmx1:4:4 0 0
> irq132/vmx1:5:3 0 0
> irq133/vmx1:6:2 0 0
> irq134/vmx1:7:1 0 0
>
> if you move it to 8 cores and 16 threads:
>
> dlg@kbuild ~$ sysctl hw.{ncpu,ncpufound,ncpuonline}
> hw.ncpu=16
> hw.ncpufound=16
> hw.ncpuonline=8
> dlg@kbuild ~$ vmstat -zi | grep vmx
> irq114/vmx0 0 0
> irq115/vmx0:0:0 40 0
> irq116/vmx0:1:14 15 0
> irq117/vmx0:2:12 33 0
> irq118/vmx0:3:10 64 0
> irq119/vmx0:4:8 23 0
> irq120/vmx0:5:6 32 0
> irq121/vmx0:6:4 137 1
> irq122/vmx0:7:2 245 3
> irq126/vmx1 0 0
> irq127/vmx1:0:0 0 0
> irq128/vmx1:1:14 0 0
> irq129/vmx1:2:12 0 0
> irq130/vmx1:3:10 0 0
> irq131/vmx1:4:8 0 0
> irq132/vmx1:5:6 0 0
> irq133/vmx1:6:4 0 0
> irq134/vmx1:7:2 0 0
> dlg@kbuild ~$ dmesg | grep smt
> cpu0: smt 0, core 0, package 0
> cpu1: smt 1, core 0, package 0
> cpu2: smt 0, core 1, package 0
> cpu3: smt 1, core 1, package 0
> cpu4: smt 0, core 2, package 0
> cpu5: smt 1, core 2, package 0
> cpu6: smt 0, core 3, package 0
> cpu7: smt 1, core 3, package 0
> cpu8: smt 0, core 4, package 0
> cpu9: smt 1, core 4, package 0
> cpu10: smt 0, core 5, package 0
> cpu11: smt 1, core 5, package 0
> cpu12: smt 0, core 6, package 0
> cpu13: smt 1, core 6, package 0
> cpu14: smt 0, core 7, package 0
> cpu15: smt 1, core 7, package 0
>
> in the first case you can see it spreading the vmx interfaces over the
> cpus. in the latter, there's not enough real cpus so it stacks the
> interrupts.
>
> jmatthew@ and i have the following question we can't resolve
> ourselves: should the api provide struct cpu_info pointers instead
> of number cpu ids?
I'd say yes. Numbered CPU IDs are always a bit vague as hardware and
software numbering schemes don't necessarily agree. The approach
chosen here depends on the cpu_info structs being populated, so there
is no benefit in using CPU IDs in the API.
> our experience so far is that pci_intr_establish_cpuid() immediately
> maps the id to a pointer anyway, and intrmap iterates over struct
> cpu_info pointers to build the list of ids, so we could just remove the
> numbers in the middle. pci_intr_establish_cpu() could take a cpu_info
> pointer, and intrmap could provide cpu_info pointers.
>
> the only caveat to this i can think of is if we need to establish
> interrupts before cpus are attached, which might be useful on arm
> archs. we can also change this in the tree.
I don't think CPU IDs are going to help resolving issues with
architectures where CPUs attach late. I think we'll end up in a
situation where architectures that want to distribute interrupts over
CPUs will have to attach (and possibly spin up) CPUs early.
Tere is no fundamental reason why we can't attach CPUs early on arm64.
> if it's not obvious, im kind of sick of talking about this stuff,
> so i'd rather shut up and hack on multiq support in the tree as
> much as possible.
>
> ok?
ok ketttenis@, preferably with the change to return struct cpu_info pointers.
> Index: share/man/man9/intrmap_create.9
> ===================================================================
> RCS file: share/man/man9/intrmap_create.9
> diff -N share/man/man9/intrmap_create.9
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ share/man/man9/intrmap_create.9 16 Jun 2020 00:13:50 -0000
> @@ -0,0 +1,125 @@
> +.\" $OpenBSD$
> +.\"
> +.\" Copyright (c) 2020 David Gwynne <[email protected]>
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate: June 16 2020 $
> +.Dt INTRMAP_CREATE 9
> +.Os
> +.Sh NAME
> +.Nm intrmap_create ,
> +.Nm intrmap_destroy ,
> +.Nm intrmap_count ,
> +.Nm intrmap_cpu
> +.Nd interrupt to CPU mapping API
> +.Sh SYNOPSIS
> +.In sys/inrtmap.h
> +.Ft struct intrmap *
> +.Fo intrmap_create
> +.Fa "const struct device *dv"
> +.Fa "unsigned int nintr"
> +.Fa "unsigned int maxintr"
> +.Fa "unsigned int flags"
> +.Fc
> +.Ft void
> +.Fn intrmap_destroy "struct intrmap *im"
> +.Ft unsigned int
> +.Fn intrmap_count "struct intrmap *im"
> +.Ft unsigned int
> +.Fn intrmap_cpu "struct intrmap *im" "unsigned int index"
> +.Sh DESCRIPTION
> +The interrupt to CPU mapping API supports the use of multiple CPUs
> +by hardware drivers.
> +Drivers that can use multiple interrupts use the API to request a
> +set of CPUs that they can establish those interrupts on.
> +The API limits the requested number of interrupts to what is available
> +on the system, and attempts to distribute the requested interrupts
> +over those CPUs.
> +On some platforms the API will filter the set of available CPUs.
> +.\" to avoid hyperthreads, basically.
> +.Pp
> +.Fn intrmap_create
> +allocates an interrupt map data structure for use by the driver
> +identified by
> +.Fa dv .
> +The number of interrupts the hardware supports is specified via the
> +.Fa nintr
> +argument.
> +The driver supplies the maximum number of interrupts it can support
> +via
> +.Fa maxintr ,
> +which, along with the number of available CPUs at the time the
> +function is called, is used as a constraint on the number of requested
> +interrupts.
> +.Fa nintr
> +may be zero to use the driver limit as the number of requested
> +interrupts.
> +The
> +.Fa flags
> +argument may have the following defines OR'ed together:
> +.Bl -tag -width xxx -offset indent
> +.It Dv INTRMAP_POWEROF2
> +The hardware only supports a power of 2 number of interrupts, so
> +constrain the number of supplied interrupts after the system and
> +driver limits are applied.
> +.El
> +.Pp
> +.Fn intrmap_destroy
> +frees the memory associated with the interrupt map data structure
> +passed via
> +.Fa im .
> +.Pp
> +.Fn intrmap_count
> +returns the number of interrupts that the driver can establish
> +according to the
> +.Fa im
> +interrupt map.
> +.Pp
> +.Fn intrmap_cpu
> +returns which CPU the interrupt specified in
> +.Fa index
> +should be established on according to the
> +.Fa im
> +interrupt map.
> +Interrupts are identified as a number from 0 to the value returned by
> +.Fn intrmap_count .
> +.Sh CONTEXT
> +.Fn intrmap_create ,
> +.Fn intrmap_destroy ,
> +.Fn intrmap_count ,
> +and
> +.Fn intrmap_cpu
> +can be called during autoconf, or from process context.
> +.Sh RETURN VALUES
> +.Fn intrmap_create
> +returns a pointer to a interrupt mapping structure on success, or
> +.Dv NULL
> +on failure.
> +.Pp
> +.Fn intrmap_count
> +returns the number of interrupts that were allocated for the driver
> +to use.
> +.Pp
> +.Fn intrmap_cpu
> +returns an identifier for the CPU that the interrupt should be
> +established on.
> +.\" .Sh SEE ALSO
> +.\" .Xr pci_intr_establish_cpuid 9
> +.Sh HISTORY
> +The interrupt mapping API is based on the if_ringmap API in
> +.Dx .
> +It was ported to
> +.Ox 6.8
> +by
> +.An David Gwynne Aq Mt [email protected] .
> Index: share/man/man9/Makefile
> ===================================================================
> RCS file: /cvs/src/share/man/man9/Makefile,v
> retrieving revision 1.300
> diff -u -p -r1.300 Makefile
> --- share/man/man9/Makefile 5 Jun 2020 02:24:12 -0000 1.300
> +++ share/man/man9/Makefile 16 Jun 2020 00:13:50 -0000
> @@ -20,7 +20,8 @@ MAN= aml_evalnode.9 atomic_add_int.9 ato
> ieee80211_node.9 ieee80211_output.9 ieee80211_proto.9 \
> ieee80211_radiotap.9 if_addrhook_add.9 if_get.9 if_rxr_init.9 \
> ifiq_input.9 ifq_enqueue.9 \
> - ifq_deq_begin.9 imax.9 iic.9 intro.9 inittodr.9 intr_barrier.9 \
> + ifq_deq_begin.9 imax.9 iic.9 intro.9 inittodr.9 \
> + intr_barrier.9 intrmap_create.9 \
> KASSERT.9 km_alloc.9 knote.9 kthread.9 ktrace.9 \
> lim_cur.9 loadfirmware.9 log.9 \
> malloc.9 membar_sync.9 memcmp.9 mbuf.9 mbuf_tags.9 md5.9 mi_switch.9 \
> Index: sys/sys/intrmap.h
> ===================================================================
> RCS file: sys/sys/intrmap.h
> diff -N sys/sys/intrmap.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/sys/intrmap.h 16 Jun 2020 00:13:50 -0000
> @@ -0,0 +1,38 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2020 David Gwynne <[email protected]>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _SYS_INTRMAP_H_
> +#define _SYS_INTRMAP_H_
> +
> +struct intrmap;
> +
> +#define INTRMAP_POWEROF2 (1 << 0)
> +
> +struct intrmap *intrmap_create(const struct device *,
> + unsigned int, unsigned int, unsigned int);
> +void intrmap_destroy(struct intrmap *);
> +
> +void intrmap_match(const struct device *,
> + struct intrmap *, struct intrmap *);
> +void intrmap_align(const struct device *,
> + struct intrmap *, struct intrmap *);
> +
> +unsigned int intrmap_count(const struct intrmap *);
> +unsigned int intrmap_cpu(const struct intrmap *, unsigned int);
> +
> +#endif /* _SYS_INTRMAP_H_ */
> Index: sys/kern/kern_intrmap.c
> ===================================================================
> RCS file: sys/kern/kern_intrmap.c
> diff -N sys/kern/kern_intrmap.c
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/kern/kern_intrmap.c 16 Jun 2020 00:13:50 -0000
> @@ -0,0 +1,347 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 1980, 1986, 1993
> + * The Regents of the University of California. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the University nor the names of its contributors
> + * may be used to endorse or promote products derived from this software
> + * without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * @(#)if.c 8.3 (Berkeley) 1/4/94
> + * $FreeBSD: src/sys/net/if.c,v 1.185 2004/03/13 02:35:03 brooks Exp $
> + */
> +
> +/*
> + * This code is adapted from the if_ringmap code in DragonflyBSD,
> + * but generalised for use by all types of devices, not just network
> + * cards.
> + */
> +
> +#include <sys/param.h>
> +#include <sys/systm.h>
> +#include <sys/device.h>
> +#include <sys/malloc.h>
> +#include <sys/rwlock.h>
> +
> +#include <sys/intrmap.h>
> +
> +struct intrmap_cpus {
> + struct refcnt ic_refs;
> + unsigned int ic_count;
> + unsigned int *ic_cpumap;
> +};
> +
> +struct intrmap {
> + unsigned int im_count;
> + unsigned int im_grid;
> + struct intrmap_cpus *
> + im_cpus;
> + unsigned int *im_cpumap;
> +};
> +
> +/*
> + * The CPUs that should be used for interrupts may be a subset of all CPUs.
> + */
> +
> +struct rwlock intrmap_lock = RWLOCK_INITIALIZER("intrcpus");
> +struct intrmap_cpus *intrmap_cpus = NULL;
> +int intrmap_ncpu = 0;
> +
> +static void
> +intrmap_cpus_put(struct intrmap_cpus *ic)
> +{
> + if (ic == NULL)
> + return;
> +
> + if (refcnt_rele(&ic->ic_refs)) {
> + free(ic->ic_cpumap, M_DEVBUF,
> + ic->ic_count * sizeof(*ic->ic_cpumap));
> + free(ic, M_DEVBUF, sizeof(*ic));
> + }
> +}
> +
> +static struct intrmap_cpus *
> +intrmap_cpus_get(void)
> +{
> + struct intrmap_cpus *oic = NULL;
> + struct intrmap_cpus *ic;
> +
> + rw_enter_write(&intrmap_lock);
> + if (intrmap_ncpu != ncpus) {
> + unsigned int icpus = 0;
> + unsigned int *cpumap;
> + CPU_INFO_ITERATOR cii;
> + struct cpu_info *ci;
> +
> + /*
> + * there's a new "version" of the set of CPUs available, so
> + * we need to figure out which ones we can use for interrupts.
> + */
> +
> + cpumap = mallocarray(ncpus, sizeof(*cpumap),
> + M_DEVBUF, M_WAITOK);
> +
> + CPU_INFO_FOREACH(cii, ci) {
> +#ifdef __HAVE_CPU_TOPOLOGY
> + if (ci->ci_smt_id > 0)
> + continue;
> +#endif
> + cpumap[icpus++] = CPU_INFO_UNIT(ci);
> + }
> +
> + if (icpus < ncpus) {
> + /* this is mostly about free(9) needing a size */
> + unsigned int *icpumap = mallocarray(icpus,
> + sizeof(*icpumap), M_DEVBUF, M_WAITOK);
> + memcpy(icpumap, cpumap, icpus * sizeof(*icpumap));
> + free(cpumap, M_DEVBUF, ncpus * sizeof(*cpumap));
> + cpumap = icpumap;
> + }
> +
> + ic = malloc(sizeof(*ic), M_DEVBUF, M_WAITOK);
> + refcnt_init(&ic->ic_refs);
> + ic->ic_count = icpus;
> + ic->ic_cpumap = cpumap;
> +
> + oic = intrmap_cpus;
> + intrmap_cpus = ic; /* give this ref to the global. */
> + } else
> + ic = intrmap_cpus;
> +
> + refcnt_take(&ic->ic_refs); /* take a ref for the caller */
> + rw_exit_write(&intrmap_lock);
> +
> + intrmap_cpus_put(oic);
> +
> + return (ic);
> +}
> +
> +static int
> +intrmap_nintrs(const struct intrmap_cpus *ic, unsigned int nintrs,
> + unsigned int maxintrs)
> +{
> + KASSERTMSG(maxintrs > 0, "invalid maximum interrupt count %u",
> + maxintrs);
> +
> + if (nintrs == 0 || nintrs > maxintrs)
> + nintrs = maxintrs;
> + if (nintrs > ic->ic_count)
> + nintrs = ic->ic_count;
> + return (nintrs);
> +}
> +
> +static void
> +intrmap_set_grid(struct intrmap *im, unsigned int unit, unsigned int grid)
> +{
> + unsigned int i, offset;
> + unsigned int *cpumap = im->im_cpumap;
> + const struct intrmap_cpus *ic = im->im_cpus;
> +
> + KASSERTMSG(grid > 0, "invalid if_ringmap grid %u", grid);
> + KASSERTMSG(grid >= im->im_count, "invalid intrmap grid %u, count %u",
> + grid, im->im_count);
> + im->im_grid = grid;
> +
> + offset = (grid * unit) % ic->ic_count;
> + for (i = 0; i < im->im_count; i++) {
> + cpumap[i] = offset + i;
> + KASSERTMSG(cpumap[i] < ic->ic_count,
> + "invalid cpumap[%u] = %u, offset %u (ncpu %d)", i,
> + cpumap[i], offset, ic->ic_count);
> + }
> +}
> +
> +struct intrmap *
> +intrmap_create(const struct device *dv,
> + unsigned int nintrs, unsigned int maxintrs, unsigned int flags)
> +{
> + struct intrmap *im;
> + unsigned int unit = dv->dv_unit;
> + unsigned int i, grid = 0, prev_grid;
> + struct intrmap_cpus *ic;
> +
> + ic = intrmap_cpus_get();
> +
> + nintrs = intrmap_nintrs(ic, nintrs, maxintrs);
> + if (ISSET(flags, INTRMAP_POWEROF2))
> + nintrs = 1 << (fls(nintrs) - 1);
> + im = malloc(sizeof(*im), M_DEVBUF, M_WAITOK | M_ZERO);
> + im->im_count = nintrs;
> + im->im_cpus = ic;
> + im->im_cpumap = mallocarray(nintrs, sizeof(*im->im_cpumap), M_DEVBUF,
> + M_WAITOK | M_ZERO);
> +
> + prev_grid = ic->ic_count;
> + for (i = 0; i < ic->ic_count; i++) {
> + if (ic->ic_count % (i + 1) != 0)
> + continue;
> +
> + grid = ic->ic_count / (i + 1);
> + if (nintrs > grid) {
> + grid = prev_grid;
> + break;
> + }
> +
> + if (nintrs > ic->ic_count / (i + 2))
> + break;
> + prev_grid = grid;
> + }
> + intrmap_set_grid(im, unit, grid);
> +
> + return (im);
> +}
> +
> +void
> +intrmap_destroy(struct intrmap *im)
> +{
> + free(im->im_cpumap, M_DEVBUF, im->im_count * sizeof(*im->im_cpumap));
> + intrmap_cpus_put(im->im_cpus);
> + free(im, M_DEVBUF, sizeof(*im));
> +}
> +
> +/*
> + * Align the two ringmaps.
> + *
> + * e.g. 8 netisrs, rm0 contains 4 rings, rm1 contains 2 rings.
> + *
> + * Before:
> + *
> + * CPU 0 1 2 3 4 5 6 7
> + * NIC_RX n0 n1 n2 n3
> + * NIC_TX N0 N1
> + *
> + * After:
> + *
> + * CPU 0 1 2 3 4 5 6 7
> + * NIC_RX n0 n1 n2 n3
> + * NIC_TX N0 N1
> + */
> +void
> +intrmap_align(const struct device *dv,
> + struct intrmap *im0, struct intrmap *im1)
> +{
> + unsigned int unit = dv->dv_unit;
> +
> + KASSERT(im0->im_cpus == im1->im_cpus);
> +
> + if (im0->im_grid > im1->im_grid)
> + intrmap_set_grid(im1, unit, im0->im_grid);
> + else if (im0->im_grid < im1->im_grid)
> + intrmap_set_grid(im0, unit, im1->im_grid);
> +}
> +
> +void
> +intrmap_match(const struct device *dv,
> + struct intrmap *im0, struct intrmap *im1)
> +{
> + unsigned int unit = dv->dv_unit;
> + const struct intrmap_cpus *ic;
> + unsigned int subset_grid, cnt, divisor, mod, offset, i;
> + struct intrmap *subset_im, *im;
> + unsigned int old_im0_grid, old_im1_grid;
> +
> + KASSERT(im0->im_cpus == im1->im_cpus);
> + if (im0->im_grid == im1->im_grid)
> + return;
> +
> + /* Save grid for later use */
> + old_im0_grid = im0->im_grid;
> + old_im1_grid = im1->im_grid;
> +
> + intrmap_align(dv, im0, im1);
> +
> + /*
> + * Re-shuffle rings to get more even distribution.
> + *
> + * e.g. 12 netisrs, rm0 contains 4 rings, rm1 contains 2 rings.
> + *
> + * CPU 0 1 2 3 4 5 6 7 8 9 10 11
> + *
> + * NIC_RX a0 a1 a2 a3 b0 b1 b2 b3 c0 c1 c2 c3
> + * NIC_TX A0 A1 B0 B1 C0 C1
> + *
> + * NIC_RX d0 d1 d2 d3 e0 e1 e2 e3 f0 f1 f2 f3
> + * NIC_TX D0 D1 E0 E1 F0 F1
> + */
> +
> + if (im0->im_count >= (2 * old_im1_grid)) {
> + cnt = im0->im_count;
> + subset_grid = old_im1_grid;
> + subset_im = im1;
> + im = im0;
> + } else if (im1->im_count > (2 * old_im0_grid)) {
> + cnt = im1->im_count;
> + subset_grid = old_im0_grid;
> + subset_im = im0;
> + im = im1;
> + } else {
> + /* No space to shuffle. */
> + return;
> + }
> +
> + ic = im0->im_cpus;
> +
> + mod = cnt / subset_grid;
> + KASSERT(mod >= 2);
> + divisor = ic->ic_count / im->im_grid;
> + offset = ((unit / divisor) % mod) * subset_grid;
> +
> + for (i = 0; i < subset_im->im_count; i++) {
> + subset_im->im_cpumap[i] += offset;
> + KASSERTMSG(subset_im->im_cpumap[i] < ic->ic_count,
> + "match: invalid cpumap[%d] = %d, offset %d",
> + i, subset_im->im_cpumap[i], offset);
> + }
> +#ifdef DIAGNOSTIC
> + for (i = 0; i < subset_im->im_count; i++) {
> + unsigned int j;
> +
> + for (j = 0; j < im->im_count; j++) {
> + if (im->im_cpumap[j] == subset_im->im_cpumap[i])
> + break;
> + }
> + KASSERTMSG(j < im->im_count,
> + "subset cpumap[%u] = %u not found in superset",
> + i, subset_im->im_cpumap[i]);
> + }
> +#endif
> +}
> +
> +unsigned int
> +intrmap_count(const struct intrmap *im)
> +{
> + return (im->im_count);
> +}
> +
> +unsigned int
> +intrmap_cpu(const struct intrmap *im, unsigned int ring)
> +{
> + const struct intrmap_cpus *ic = im->im_cpus;
> + unsigned int icpu;
> + KASSERTMSG(ring < im->im_count, "invalid ring %u", ring);
> + icpu = im->im_cpumap[ring];
> + KASSERTMSG(icpu < ic->ic_count, "invalid interrupt cpu %u for ring %u"
> + " (intrmap %p)", icpu, ring, im);
> + return (ic->ic_cpumap[icpu]);
> +}
> Index: sys/conf/files
> ===================================================================
> RCS file: /cvs/src/sys/conf/files,v
> retrieving revision 1.686
> diff -u -p -r1.686 files
> --- sys/conf/files 15 Apr 2020 09:26:49 -0000 1.686
> +++ sys/conf/files 16 Jun 2020 00:13:50 -0000
> @@ -20,6 +20,7 @@ define i2cbus {}
> define gpiobus {}
> define onewirebus {}
> define video {}
> +define intrmap
>
> # filesystem firmware loading attribute
> define firmload
> @@ -691,6 +692,7 @@ file kern/kern_resource.c
> file kern/kern_pledge.c
> file kern/kern_unveil.c
> file kern/kern_sched.c
> +file kern/kern_intrmap.c intrmap
> file kern/kern_sensors.c
> file kern/kern_sig.c
> file kern/kern_smr.c
>
>