On Thu, Jan 23, 2020 at 11:28:50AM +0100, Martin Pieuchot wrote:
> I'd like to make progress towards interrupting multiple CPUs in order to
> one day make use of multiple queues in some network drivers.  The road
> towards that goal is consequent and I'd like to proceed in steps to make
> it easier to squash bugs.  I'm currently thinking of the following steps:
> 
>  1. Is my interrupt handler safe to be executed on CPU != CPU0?
> 
>  2. Is it safe to execute this handler on two or more CPUs at the same
>     time?
> 
>  3. How does interrupting multiple CPUs influence packet processing in
>     the softnet thread?  Is any knowledge required (CPU affinity?) to
>     have an optimum processing when multiple softnet threads are used?
> 
>  4. How to split traffic in one incoming NIC between multiple processing
>     units?
> 
> This new journey comes with the requirement of being able to interrupt
> an arbitrary CPU.  For that we need a new API.  Patrick gave me the
> diff below during u2k20 and I'd like to use it to start a discussion.
> 
> We currently have 6 drivers using pci_intr_map_msix().  Since we want to
> be able to specify a CPU should we introduce a new function like in the
> diff below or do we prefer to add a new argument (cpuid_t?) to this one?
> This change in itself should already allow us to proceed with the first
> item of the above list.
> 
> Then we need a way to read the MSI-X control table size using the define
> PCI_MSIX_CTL_TBLSIZE() below.  This can be done in MI, we might also
> want to print that information in dmesg, some maybe cache it in pci(4)?
> 
> Does somebody has a better/stronger/magic way to achieve this goal?

I worked on this in mcx(4) a while ago, but I didn't manage to get the nic's
RSS hashing working at the time so all I really achieved was moving the one
rx interrupt to some other cpu.  I think I got tx interrupts across multiple
cpus without any problems, but that's not the interesting bit.

The approach I took (diff below) was to, for msi-x vectors marked MPSAFE,
try to put vector n on the nth online cpu.  This means the driver doesn't
have to think the details, it just asks for a bunch of interrupts and they
end up spread across the system.  I'd probably just want a function that
tells me the most vectors I can usefully use for a given device, which would
combine the table size for the device, the number of cpus online, and for nic
drivers, the number of softnet threads.

The driver would then either set up rings to match the number of vectors, or
map vectors to however many rings it has accordingly.  For the drivers I've
worked on (mcx, ixl, bnxt, iavf), there's no coordination required between the
vectors, so running them concurrently on multiple cpus is safe.  All we need
is to set up multiple rings and RSS hashing.


Index: amd64/acpi_machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/acpi_machdep.c,v
retrieving revision 1.89
diff -u -p -r1.89 acpi_machdep.c
--- amd64/acpi_machdep.c        20 Dec 2019 07:49:31 -0000      1.89
+++ amd64/acpi_machdep.c        23 Jan 2020 22:45:17 -0000
@@ -194,7 +194,7 @@ acpi_intr_establish(int irq, int flags, 
 
        type = (flags & LR_EXTIRQ_MODE) ? IST_EDGE : IST_LEVEL;
        return (intr_establish(-1, (struct pic *)apic, map->ioapic_pin,
-           type, level, handler, arg, what));
+           type, level, handler, arg, what, 0));
 #else
        return NULL;
 #endif
Index: amd64/intr.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/intr.c,v
retrieving revision 1.52
diff -u -p -r1.52 intr.c
--- amd64/intr.c        25 Mar 2019 18:45:27 -0000      1.52
+++ amd64/intr.c        23 Jan 2020 22:45:17 -0000
@@ -220,7 +220,7 @@ intr_allocate_slot_cpu(struct cpu_info *
  */
 int
 intr_allocate_slot(struct pic *pic, int legacy_irq, int pin, int level,
-    struct cpu_info **cip, int *index, int *idt_slot)
+    struct cpu_info **cip, int *index, int *idt_slot, int cpuhint)
 {
        CPU_INFO_ITERATOR cii;
        struct cpu_info *ci;
@@ -282,24 +282,33 @@ duplicate:
        } else {
 other:
                /*
-                * Otherwise, look for a free slot elsewhere. Do the primary
-                * CPU first.
-                */
-               ci = &cpu_info_primary;
-               error = intr_allocate_slot_cpu(ci, pic, pin, &slot);
-               if (error == 0)
-                       goto found;
-
-               /*
-                * ..now try the others.
+                * Find an online cpu to allocate a slot on.
+                * Skip 'cpuhint' candidates to spread interrupts across cpus,
+                * but keep checking following candidates after that if we 
didn't
+                * find a slot.
                 */
+               cpuhint %= sysctl_hwncpuonline();
                CPU_INFO_FOREACH(cii, ci) {
-                       if (CPU_IS_PRIMARY(ci))
+                       if (!cpu_is_online(ci))
                                continue;
+
+                       if (cpuhint > 0) {
+                               cpuhint--;
+                               continue;
+                       }
+
                        error = intr_allocate_slot_cpu(ci, pic, pin, &slot);
                        if (error == 0)
                                goto found;
                }
+
+               /*
+                * If we didn't find a slot, try the primary again just in case.
+                */
+               ci = &cpu_info_primary;
+               error = intr_allocate_slot_cpu(ci, pic, pin, &slot);
+               if (error == 0)
+                       goto found;
                return EBUSY;
 found:
                idtvec = idt_vec_alloc(APIC_LEVEL(level), IDT_INTR_HIGH);
@@ -323,7 +332,7 @@ int intr_shared_edge;
 
 void *
 intr_establish(int legacy_irq, struct pic *pic, int pin, int type, int level,
-    int (*handler)(void *), void *arg, const char *what)
+    int (*handler)(void *), void *arg, const char *what, int cpuhint)
 {
        struct intrhand **p, *q, *ih;
        struct cpu_info *ci;
@@ -346,7 +355,7 @@ intr_establish(int legacy_irq, struct pi
        KASSERT(level <= IPL_TTY || level >= IPL_CLOCK || flags & IPL_MPSAFE);
 
        error = intr_allocate_slot(pic, legacy_irq, pin, level, &ci, &slot,
-           &idt_vec);
+           &idt_vec, cpuhint);
        if (error != 0) {
                printf("failed to allocate interrupt slot for PIC %s pin %d\n",
                    pic->pic_dev.dv_xname, pin);
@@ -419,7 +428,7 @@ intr_establish(int legacy_irq, struct pi
        ih->ih_pin = pin;
        ih->ih_cpu = ci;
        ih->ih_slot = slot;
-       evcount_attach(&ih->ih_count, what, &source->is_idtvec);
+       evcount_attach(&ih->ih_count, what, &source->is_evdata);
 
        *p = ih;
 
@@ -436,6 +445,8 @@ intr_establish(int legacy_irq, struct pi
                ci->ci_isources[slot]->is_recurse = stubp->ist_recurse;
                setgate(&idt[idt_vec], stubp->ist_entry, 0, SDT_SYS386IGT,
                    SEL_KPL, GSEL(GCODE_SEL, SEL_KPL));
+
+               source->is_evdata = (ci->ci_cpuid << 24) | idt_vec;
        }
 
        pic->pic_addroute(pic, ci, pin, idt_vec, type);
Index: include/intr.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/intr.h,v
retrieving revision 1.31
diff -u -p -r1.31 intr.h
--- include/intr.h      21 Dec 2018 01:51:07 -0000      1.31
+++ include/intr.h      23 Jan 2020 22:45:17 -0000
@@ -75,6 +75,7 @@ struct intrsource {
        int is_flags;                   /* see below */
        int is_type;                    /* level, edge */
        int is_idtvec;
+       int is_evdata;                  /* cpu id << 24 | idtvec */
        int is_minlevel;
 };
 
@@ -200,9 +201,9 @@ int x86_nmi(void);
 void intr_calculatemasks(struct cpu_info *);
 int intr_allocate_slot_cpu(struct cpu_info *, struct pic *, int, int *);
 int intr_allocate_slot(struct pic *, int, int, int, struct cpu_info **, int *,
-           int *);
+           int *, int);
 void *intr_establish(int, struct pic *, int, int, int, int (*)(void *),
-           void *, const char *);
+           void *, const char *, int);
 void intr_disestablish(struct intrhand *);
 int intr_handler(struct intrframe *, struct intrhand *);
 void cpu_intr_init(struct cpu_info *);
Index: isa/isa_machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/isa/isa_machdep.c,v
retrieving revision 1.29
diff -u -p -r1.29 isa_machdep.c
--- isa/isa_machdep.c   14 Oct 2017 04:44:43 -0000      1.29
+++ isa/isa_machdep.c   23 Jan 2020 22:45:17 -0000
@@ -313,7 +313,7 @@ isa_intr_establish(isa_chipset_tag_t ic,
        KASSERT(pic);
 
        return intr_establish(irq, pic, pin, type, level, ih_fun,
-           ih_arg, ih_what);
+           ih_arg, ih_what, 0);
 }
 
 /*
Index: pci/pci_machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/pci/pci_machdep.c,v
retrieving revision 1.73
diff -u -p -r1.73 pci_machdep.c
--- pci/pci_machdep.c   7 Sep 2019 13:46:19 -0000       1.73
+++ pci/pci_machdep.c   23 Jan 2020 22:45:17 -0000
@@ -734,11 +734,17 @@ pci_intr_establish(pci_chipset_tag_t pc,
 
        if (ih.line & APIC_INT_VIA_MSG) {
                return intr_establish(-1, &msi_pic, tag, IST_PULSE, level,
-                   func, arg, what);
+                   func, arg, what, 0);
        }
        if (ih.line & APIC_INT_VIA_MSGX) {
+               int cpuhint = 0;
+
+               /* this maps vector 0 to primary cpu, others elsewhere */
+               if (level & IPL_MPSAFE)
+                       cpuhint = PCI_MSIX_VEC(ih.tag);
+
                return intr_establish(-1, &msix_pic, tag, IST_PULSE, level,
-                   func, arg, what);
+                   func, arg, what, cpuhint);
        }
 
        pci_decompose_tag(pc, ih.tag, &bus, &dev, NULL);
@@ -764,7 +770,7 @@ pci_intr_establish(pci_chipset_tag_t pc,
        }
 #endif
 
-       return intr_establish(irq, pic, pin, IST_LEVEL, level, func, arg, what);
+       return intr_establish(irq, pic, pin, IST_LEVEL, level, func, arg, what, 
0);
 }
 
 void

Reply via email to