Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 07.01.15 at 16:34, wrote: > On 01/07/2015 10:07 AM, Jan Beulich wrote: > On 07.01.15 at 15:47, wrote: >>> On 07/01/15 14:42, Boris Ostrovsky wrote: I kept this field as an int to be able to store NUMA_NO_NODE which I thought to be (int)-1. But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to (int)-1 by pxm_to_node(). Given that there is a number of tests for NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() return u8 as well? >>> I noticed this as well, and found it quite counter intuitive. >>> >>> I would suggest fixing NUMA_NO_NODE to -1 and removing some of the >>> type-punning. >> I have to admit that I see no value in wasting 4 bytes for something >> that for the foreseeable future won't exceed 1 byte. > > The downside of going to u8 is that we'd be limiting number of nodes to > 254, which is somewhat awkward. OTOH we already do this by testing > nodeID against 0xff in various places. With NODES_SHIFT being 6 and hence MAX_NUMNODES being 0x40, we can't reach 254 right now anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 07.01.15 at 16:31, wrote: > On 01/07/2015 10:06 AM, Jan Beulich wrote: >> Of course an additional question would be whether the node wouldn't better go into struct arch_pci_dev - that depends on whether we expect ARM to be using NUMA... >>> Since we have CPU topology in common code I thought this would be >>> arch-independent as well. >> Not sure what you're referring to here: What common piece of data >> stores the node of a particular CPU? cpu_to_node[] clearly is x86- >> specific. > > I meant that we have common interface (sysctl) for querying topology and > PCI is a common IO bus (unlike CPUs) so I figured I'd keep node in > common part as well. Makes little sense: ARM stubs out cpu_to_node(), so a similar mechanism could be used to hide the fact that some arch-es may have no node associated with a PCI device. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
On 01/07/2015 10:06 AM, Jan Beulich wrote: Of course an additional question would be whether the node wouldn't better go into struct arch_pci_dev - that depends on whether we expect ARM to be using NUMA... Since we have CPU topology in common code I thought this would be arch-independent as well. Not sure what you're referring to here: What common piece of data stores the node of a particular CPU? cpu_to_node[] clearly is x86- specific. I meant that we have common interface (sysctl) for querying topology and PCI is a common IO bus (unlike CPUs) so I figured I'd keep node in common part as well. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
On 01/07/2015 10:07 AM, Jan Beulich wrote: On 07.01.15 at 15:47, wrote: On 07/01/15 14:42, Boris Ostrovsky wrote: I kept this field as an int to be able to store NUMA_NO_NODE which I thought to be (int)-1. But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to (int)-1 by pxm_to_node(). Given that there is a number of tests for NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() return u8 as well? I noticed this as well, and found it quite counter intuitive. I would suggest fixing NUMA_NO_NODE to -1 and removing some of the type-punning. I have to admit that I see no value in wasting 4 bytes for something that for the foreseeable future won't exceed 1 byte. The downside of going to u8 is that we'd be limiting number of nodes to 254, which is somewhat awkward. OTOH we already do this by testing nodeID against 0xff in various places. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 07.01.15 at 15:47, wrote: > On 07/01/15 14:42, Boris Ostrovsky wrote: >> I kept this field as an int to be able to store NUMA_NO_NODE which I >> thought to be (int)-1. >> >> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to >> (int)-1 by pxm_to_node(). Given that there is a number of tests for >> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() >> return u8 as well? > > I noticed this as well, and found it quite counter intuitive. > > I would suggest fixing NUMA_NO_NODE to -1 and removing some of the > type-punning. I have to admit that I see no value in wasting 4 bytes for something that for the foreseeable future won't exceed 1 byte. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 07.01.15 at 15:42, wrote: > On 01/07/2015 04:06 AM, Jan Beulich wrote: > On 06.01.15 at 03:18, wrote: >>> --- a/xen/include/xen/pci.h >>> +++ b/xen/include/xen/pci.h >>> @@ -56,6 +56,8 @@ struct pci_dev { >>> >>> u8 phantom_stride; >>> >>> +int node; /* NUMA node */ >> I thought I asked about this on v1 already: Does this really need to be >> an int, when commonly node numbers are stored in u8/unsigned char? >> Shrinking the field size would prevent the structure size from growing... > > I kept this field as an int to be able to store NUMA_NO_NODE which I > thought to be (int)-1. > > But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to > (int)-1 by pxm_to_node(). Given that there is a number of tests for > NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() > return u8 as well? I think that would make sense, together with fixing up one of the three callers in VT-d code (from alloc_pgtable_maddr()); the other two look correct already. >> Of course an additional question would be whether the node wouldn't >> better go into struct arch_pci_dev - that depends on whether we >> expect ARM to be using NUMA... > > Since we have CPU topology in common code I thought this would be > arch-independent as well. Not sure what you're referring to here: What common piece of data stores the node of a particular CPU? cpu_to_node[] clearly is x86- specific. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
On 07/01/15 14:42, Boris Ostrovsky wrote: > On 01/07/2015 04:06 AM, Jan Beulich wrote: > On 06.01.15 at 03:18, wrote: >>> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, >>> XEN_GUEST_HANDLE_PARAM(void) arg) >>> } >>> else >>> pdev_info.is_virtfn = 0; >>> -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); >>> + >>> +if ( add.flags & XEN_PCI_DEV_PXM ) >>> +{ >>> +uint32_t pxm; >>> +int optarr_off = offsetof(struct >>> physdev_pci_device_add, optarr) / >> unsigned int or size_t. >> >>> --- a/xen/include/xen/pci.h >>> +++ b/xen/include/xen/pci.h >>> @@ -56,6 +56,8 @@ struct pci_dev { >>> u8 phantom_stride; >>> +int node; /* NUMA node */ >> I thought I asked about this on v1 already: Does this really need to be >> an int, when commonly node numbers are stored in u8/unsigned char? >> Shrinking the field size would prevent the structure size from >> growing... > > > I kept this field as an int to be able to store NUMA_NO_NODE which I > thought to be (int)-1. > > But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to > (int)-1 by pxm_to_node(). Given that there is a number of tests for > NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() > return u8 as well? I noticed this as well, and found it quite counter intuitive. I would suggest fixing NUMA_NO_NODE to -1 and removing some of the type-punning. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
On 01/07/2015 04:06 AM, Jan Beulich wrote: On 06.01.15 at 03:18, wrote: @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) } else pdev_info.is_virtfn = 0; -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); + +if ( add.flags & XEN_PCI_DEV_PXM ) +{ +uint32_t pxm; +int optarr_off = offsetof(struct physdev_pci_device_add, optarr) / unsigned int or size_t. --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -56,6 +56,8 @@ struct pci_dev { u8 phantom_stride; +int node; /* NUMA node */ I thought I asked about this on v1 already: Does this really need to be an int, when commonly node numbers are stored in u8/unsigned char? Shrinking the field size would prevent the structure size from growing... I kept this field as an int to be able to store NUMA_NO_NODE which I thought to be (int)-1. But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to (int)-1 by pxm_to_node(). Given that there is a number of tests for NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() return u8 as well? Of course an additional question would be whether the node wouldn't better go into struct arch_pci_dev - that depends on whether we expect ARM to be using NUMA... Since we have CPU topology in common code I thought this would be arch-independent as well. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 06.01.15 at 03:18, wrote: > @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, > XEN_GUEST_HANDLE_PARAM(void) arg) > } > else > pdev_info.is_virtfn = 0; > -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); > + > +if ( add.flags & XEN_PCI_DEV_PXM ) > +{ > +uint32_t pxm; > +int optarr_off = offsetof(struct physdev_pci_device_add, optarr) > / unsigned int or size_t. > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -56,6 +56,8 @@ struct pci_dev { > > u8 phantom_stride; > > +int node; /* NUMA node */ I thought I asked about this on v1 already: Does this really need to be an int, when commonly node numbers are stored in u8/unsigned char? Shrinking the field size would prevent the structure size from growing... Of course an additional question would be whether the node wouldn't better go into struct arch_pci_dev - that depends on whether we expect ARM to be using NUMA... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
>>> On 06.01.15 at 12:55, wrote: > On 06/01/15 02:18, Boris Ostrovsky wrote: > >> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h >> index 5f295f3..a5eb81c 100644 >> --- a/xen/include/xen/pci.h >> +++ b/xen/include/xen/pci.h >> @@ -56,6 +56,8 @@ struct pci_dev { >> >> u8 phantom_stride; >> >> +int node; /* NUMA node */ >> + >> enum pdev_type { >> DEV_TYPE_PCI_UNKNOWN, >> DEV_TYPE_PCIe_ENDPOINT, >> @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *, >> int pci_release_devices(struct domain *d); >> int pci_add_segment(u16 seg); >> const unsigned long *pci_get_ro_map(u16 seg); >> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *); >> +int pci_add_device(u16 seg, u8 bus, u8 devfn, >> + const struct pci_dev_info *, int); > > Please use parameter names in definitions. For the added parameter - yes. For the pre-existing pointer one I don't see a strong need to do so (and there was no name there before). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
On 06/01/15 02:18, Boris Ostrovsky wrote: > diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h > index 5f295f3..a5eb81c 100644 > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -56,6 +56,8 @@ struct pci_dev { > > u8 phantom_stride; > > +int node; /* NUMA node */ > + > enum pdev_type { > DEV_TYPE_PCI_UNKNOWN, > DEV_TYPE_PCIe_ENDPOINT, > @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *, > int pci_release_devices(struct domain *d); > int pci_add_segment(u16 seg); > const unsigned long *pci_get_ro_map(u16 seg); > -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *); > +int pci_add_device(u16 seg, u8 bus, u8 devfn, > + const struct pci_dev_info *, int); Please use parameter names in definitions. ~Andrew > int pci_remove_device(u16 seg, u8 bus, u8 devfn); > int pci_ro_device(int seg, int bus, int devfn); > void arch_pci_ro_device(int seg, int bdf); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information
If ACPI provides PXM data for IO devices then dom0 will pass it to hypervisor during PHYSDEVOP_pci_device_add call. This information, however, is currently ignored. We will store this information (in the form of nodeID) in pci_dev structure so that we can provide it, for example, to the toolstack when it adds support (in the following patches) for querying the hypervisor about device topology We will also print it when user requests device information dump. Signed-off-by: Boris Ostrovsky --- xen/arch/x86/physdev.c| 23 --- xen/drivers/passthrough/pci.c | 13 + xen/include/public/physdev.h |6 ++ xen/include/xen/pci.h |5 - 4 files changed, 39 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 6b3201b..62e768e 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&manage_pci, arg, 1) != 0 ) break; -ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL); +ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, + NULL, NUMA_NO_NODE); break; } @@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn; ret = pci_add_device(0, manage_pci_ext.bus, manage_pci_ext.devfn, - &pdev_info); + &pdev_info, NUMA_NO_NODE); break; } case PHYSDEVOP_pci_device_add: { struct physdev_pci_device_add add; struct pci_dev_info pdev_info; +int node; ret = -EFAULT; if ( copy_from_guest(&add, arg, 1) != 0 ) @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) } else pdev_info.is_virtfn = 0; -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info); + +if ( add.flags & XEN_PCI_DEV_PXM ) +{ +uint32_t pxm; +int optarr_off = offsetof(struct physdev_pci_device_add, optarr) / + sizeof(add.optarr[0]); + +if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) ) +break; + +node = pxm_to_node(pxm); +} +else +node = NUMA_NO_NODE; + +ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node); break; } diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 78c6977..3002129 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev) pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl); } -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info) +int pci_add_device(u16 seg, u8 bus, u8 devfn, + const struct pci_dev_info *info, int node) { struct pci_seg *pseg; struct pci_dev *pdev; @@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info) pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn); spin_unlock(&pcidevs_lock); if ( !pdev ) -pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL); +pci_add_device(seg, info->physfn.bus, info->physfn.devfn, + NULL, node); pdev_type = "virtual function"; } else @@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info) if ( !pdev ) goto out; +pdev->node = node; + if ( info ) pdev->info = *info; else if ( !pdev->vf_rlen[0] ) @@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg) list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) { -printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ", +printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ", pseg->nr, pdev->bus, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), - pdev->domain ? pdev->domain->domain_id : -1); + pdev->domain ? pdev->domain->domain_id : -1, + (pdev->node != NUMA_NO_NODE) ? pdev->node : -1); list_for_each_entry ( msi, &pdev->msi_list, list ) printk("%d ", msi->irq); printk(">\n"); diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index d547928..309346b 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -293,6 +293,12 @@ struct physdev_pci_device_add { uint8_t bus; uint8_t devfn; } physfn; + +/* + * Optional parameters array. + * First element ([0]) is PXM domain associa