Hi, I implement MSI/MSI-X support based on my IRQ affinity code. # here is the mail about IRQ affinity code # http://mail-index.netbsd.org/tech-kern/2014/09/12/msg017653.html
Here is the implementation, https://github.com/knakahara/netbsd-src/tree/rfc/msi-msix and hereis the patches, (1) http://knakahara.github.io/patches/netbsd/msi-msix-support-01-irq-affinity.patch IRQ affinity code with some bug fixes (2) http://knakahara.github.io/patches/netbsd/msi-msix-support-02-main.patch main MSI/MSI-X support code (3) http://knakahara.github.io/patches/netbsd/msi-msix-support-03-fix-build-failure.patch tiny patch to fix build failure Using these APIs, if_vmx can use multiqueue like below # hikaru@n.o implements if_vmx multiqueue code, thanks # https://github.com/knakahara/netbsd-src/tree/k-nakahara-msi-msix-proto2-test-vmx ==================== # intrctl list interrupt name CPU#00(+) CPU#01(+) ioapic0 pin 9 0* 0 unknown ioapic0 pin 1 0* 0 unknown ioapic0 pin 12 0* 0 unknown ioapic0 pin 14 0* 0 unknown ioapic0 pin 15 6* 0 unknown ioapic0 pin 17 82321* 0 unknown ioapic0 pin 16 17* 0 unknown msix0 vec 0 11935* 0 vmx0: tx 0 msix0 vec 1 0* 0 vmx0: tx 1 (*1) msix0 vec 2 14895* 0 vmx0: rx 0 msix0 vec 3 1904* 0 vmx0: rx 1 msix0 vec 4 0* 0 vmx0: link ioapic0 pin 19 0* 0 unknown ioapic0 pin 7 0* 0 unknown ioapic0 pin 4 0* 0 unknown ioapic0 pin 3 0* 0 unknown ioapic0 pin 6 0* 0 unknown ==================== (*1) This if_vmx implementation use multiqueue for only receive side. if_vmx creates and establishes "tx 1", but does not use it. Of course, MSI/MSI-X can affinity like normal interrupts ==================== # sh intrctl affinity -i 'msix0 vec 2' -c 1 # sh intrctl affinity -i 'msix0 vec 3' -c 1 (send and receive some files) # intrctl list interrupt name CPU#00(+) CPU#01(+) ioapic0 pin 9 0* 0 unknown ioapic0 pin 1 0* 0 unknown ioapic0 pin 12 0* 0 unknown ioapic0 pin 14 0* 0 unknown ioapic0 pin 15 6* 0 unknown ioapic0 pin 17 82668* 0 unknown ioapic0 pin 16 49* 0 unknown msix0 vec 0 14772* 0 vmx0: tx 0 msix0 vec 1 0* 0 vmx0: tx 1 msix0 vec 2 15024 2010* vmx0: rx 0 msix0 vec 3 1905 1089* vmx0: rx 1 msix0 vec 4 0* 0 vmx0: link ioapic0 pin 19 0* 0 unknown ioapic0 pin 7 0* 0 unknown ioapic0 pin 4 0* 0 unknown ioapic0 pin 3 0* 0 unknown ioapic0 pin 6 0* 0 unknown ==================== Furthermore, I write a simple (but not brief) manual addition to pci_intr(9). I show the manual in the end of this mail. Could you comment the specification and implementation? Thanks, ========== MSI/MSI-X API manual ========== PCI_INTR(9) SYNOPSIS /* existing */ int pci_intr_map(const struct pci_attach_args *pa, pci_intr_handle_t *ih); const char * pci_intr_string(pci_chipset_t *pc, pci_intr_handle_t ih, char *buf, size_t len); void * pci_intr_establish(pci_chipset_t *pc, pci_intr_handle_t ih, int ipl, int (*intrhand)(void *), void *intrarg); void pci_intr_disestablish(pci_chipset_t *pc, void *ih); /******************************************************************************/ /* new APIs for normal interrupt */ int pci_intr_alloc(const struct pci_attach_args *pa, pci_intr_handle_t **pih); void pci_intr_release(pci_intr_handle_t *pih); /******************************************************************************/ /* new APIs for MSI */ int pci_msi_count(struct pci_attach_args *pa); int pci_msi_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count); int pci_msi_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count); void pci_msi_release(pci_intr_handle_t **pihs, int count); void * pci_msi_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg); void * pci_msi_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg, const char *xname); void pci_msi_disestablish(pci_chipset_tag_t pc, void *cookie); /******************************************************************************/ /* new APIs for MSI-X */ int pci_msix_count(struct pci_attach_args *pa); int pci_msix_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count); int pci_msix_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count); void pci_msix_release(pci_intr_handle_t **pihs, int count); void * pci_msix_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg, const char *xname); void * pci_msix_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg, const char *xname); void pci_msix_disestablish(pci_chipset_tag_t pc, void *cookie); int pci_msix_remap(pci_intr_handle_t *pihs, int count); /******************************************************************************/ /* for all interrupt */ void pci_any_intr_disestablish(pci_chipset_tag_t, void *); void pci_any_intr_release(pci_intr_handle_t **, int); DESCRIPTION FOR MSI/MSI-X The pci_msi and pci_msix functions exist to allow device drivers machine- independet access to PCI MSI/MSI-X. The functions described in this page are typically declared in a port's <machine/pci_machdep.h> header file; however, drivers should generally include <dev/pci/pcivar.h> to get other PCI-specific declarations as well. If a driver wishes to establish an MSI/MSI-X handler for the device, it should pass the struct pci_attach_args * to the pci_msi{,x}_alloc() or pci_msi{,x}_alloc_exact() function, which returns zero on success, and nonzero on failure. The function allocates pci_intr_handler_t * array anad sets each pci_intr_handler_t pointed at by its second argument to a machine-dependent value which identifies a particular MSI/MSI-X vector. If the driver wishes to refer to the interrupt source in an attach or error message, it should use the value returned by pci_intr_string() too. This function can use normal interrupt and MSI/MSI-X. Subsequently, when the driver is prepared to receive interrupts, it should call pci_msi{,x}_establish() to actually establish the handler; when the MSI/MSI-X vector interrupts, intrhand will be called with a single argument intrarg, and will run at the interrupt priority level ipl. The return value of pci_msi{,x}_establish() may be saved and passed to pci_msi{,x}_disestablish() to disable the interrupt handler when the driver is no longer interested in interrupts from the device. The device drivers must call pci_msi{,x}_release() to release resources after pci_msi{,x}_disestablish(). In addition, if device drivers want to treat normal interrupt and MSI/MSI-X, device drivers should use pci_intr_alloc()/ pci_intr_release()instead of pci_intr_map(). The function allocates pci_intr_handle_t as well as pci_msi{,x}_alloc() does. Using pci_intr_alloc(), device drivers can use pci_any_intr_disestablish() and pci_any_intr_release(). Of cause, device drivers wich don't use MSI/MSI-X can use pci_intr_map() as used to be. FUNCTION int pci_intr_alloc(const struct pci_attach_args *pa, pci_intr_handle_t **pih); "pa" is pci_attach_args passed from device driver's attach function. "pih" is pointer to pci_intr_handle_t *. pci_intr_handle_t is allocated in pci_intr_alloc(), so device drivers must call pci_intr_relase() or pci_any_intr_release(). void pci_intr_release(pci_intr_handle_t *pih) pih is pointer to pci_intr_handle_t to release resources. /******************************************************************************/ /* for MSI */ int pci_msi_count(struct pci_attach_args *pa); return max number of MSI vectors which supported by device. In other words, return hardware limit of MSI vectors. If the device does not support MSI, returns zero. int pci_msi_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count); This function allocates and sets pci_intr_handle_t. "ihps" is pointer to the array of pci_intr_handle_t allocated by this function. "count" is vector number wanted by device drivers. Therefore, if there is no enogh resources, "count" may be decremented at return time. This function returns zero on success, and returns non-zero on failure. Due to PCI supecification, "count" must be power of 2. Even if "count" is decremented, it must stay within the constraint. int pci_msi_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count); This function is similar to pci_msi_alloc(), the only difference is "count" is never decremented. void pci_msi_release(pci_intr_handle_t **pihs, int count); "pih" is pointer to array of pci_intr_handle_t to release resources. "count" is number of allocated handlers. void * pci_msi_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg); This function is similar to pci_intr_establish(). Return value and arguments are the same as pci_intr_establish(). The only difference is "ih" must be MSI handler. If "ih" is normal interrupt handler, this function fails. void * pci_msi_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg, const char *xname); This function is similar to pci_msi_establish(). The only difference is use "xname" as MSI vector name. void pci_msi_disestablish(pci_chipset_tag_t pc, void *cookie); This function is similar to pci_intr_disestablish(). Return value and arguments are the same as pci_intr_establish(). The only difference is "ih" must be MSI handler. If "ih" is normal interrupt handler, this function fails. /******************************************************************************/ /* for MSI-X */ int pci_msix_count(struct pci_attach_args *pa); This function is similar to pci_msi_count(). The only difference is returns max number of MSI-X vectors. int pci_msix_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count); This function is similar to pci_msi_alloc(). The differences is two: - allocate handler for MSI-X - "count" can be any number more than zero int pci_msix_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count); This function is similar to pci_msi_alloc_exact(). The differences is two: - allocate handler for MSI-X - "count" can be any number more than zero void pci_msix_release(pci_intr_handle_t **pihs, int count); This function is wrapper function to pci_msi_release(). void * pci_msix_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg, const char *xname); This function is similar to pci_msi_establish(). The only difference is "ih" must be MSI-X handler. This function use devices' MSI-X vector table continuously in order from 0. If device drivers want to use MSI-X vector table non-continuously, drivers should use pci_msix_remap(). void * pci_msix_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, int (*func)(void *), void *arg, const char *xname); This function is similar to pci_msi_establish_xname(). The only difference is "ih" must be MSI-X handler. void pci_msix_disestablish(pci_chipset_tag_t pc, void *cookie); This function is similar to pci_msi_disestablish(). The only difference is "ih" must be MSI-X handler. int pci_msix_remap(pci_intr_handle_t *pihs, int count); This function remap MSI-X vector table entries. "pihs" is array of pci_intr_handle_t for MSI-X. "count" is total number of table entries after remapped. This function returns zero on success, and return non-zero on failure without changing MSI-X vector table. For example, if device drivers want to remap above, before after | index | conbined handler | | index | conbined handler | +-------+------------------+ +-------+------------------+ | 0 | pihs[0] | | 0 | pihs[3] | +-------+------------------+ +-------+------------------+ | 1 | pihs[1] | -> | 1 | (not used) | +-------+------------------+ +-------+------------------+ | 2 | pihs[2] | | 2 | (not used) | +-------+------------------+ +-------+------------------+ | 3 | pihs[3] | | 3 | (not used) | +-------+------------------+ +-------+------------------+ | 4 | pihs[0] | +-------+------------------+ | 5 | (not used) | +-------+------------------+ | 6 | pihs[1] | +-------+------------------+ // pihs[2] is disestablished the device driver should use this function like this. ==================== pci_intr_handle_t after[7]; after[0] = before[3]; after[1] = MSI_INT_MSIX_INVALID; // not using mark after[2] = MSI_INT_MSIX_INVALID; after[3] = MSI_INT_MSIX_INVALID; after[4] = before[0]; after[5] = MSI_INT_MSIX_INVALID; after[6] = before[1]; ret = pci_msix_remap(after, 7); if (ret != 0) // error handling else pci_msix_disestablish(before[2]); ==================== ========== MSI/MSI-X API manual ========== -- ////////////////////////////////////////////////////////////////////// Internet Initiative Japan Inc. Device Engineering Section, Core Product Development Department, Product Division, Technology Unit Kengo NAKAHARA <k-nakah...@iij.ad.jp>