Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Jan Beulich
>>> On 07.01.15 at 16:34,  wrote:
> On 01/07/2015 10:07 AM, Jan Beulich wrote:
> On 07.01.15 at 15:47,  wrote:
>>> On 07/01/15 14:42, Boris Ostrovsky wrote:
 I kept this field as an int to be able to store NUMA_NO_NODE which I
 thought to be (int)-1.

 But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
 (int)-1 by pxm_to_node(). Given that there is a number of tests for
 NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
 return u8 as well?
>>> I noticed this as well, and found it quite counter intuitive.
>>>
>>> I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
>>> type-punning.
>> I have to admit that I see no value in wasting 4 bytes for something
>> that for the foreseeable future won't exceed 1 byte.
> 
> The downside of going to u8 is that we'd be limiting number of nodes to 
> 254, which is somewhat awkward. OTOH we already do this by testing 
> nodeID against 0xff in various places.

With NODES_SHIFT being 6 and hence MAX_NUMNODES being 0x40,
we can't reach 254 right now anyway.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Jan Beulich
>>> On 07.01.15 at 16:31,  wrote:
> On 01/07/2015 10:06 AM, Jan Beulich wrote:
>>
 Of course an additional question would be whether the node wouldn't
 better go into struct arch_pci_dev - that depends on whether we
 expect ARM to be using NUMA...
>>> Since we have CPU topology in common code I thought this would be
>>> arch-independent as well.
>> Not sure what you're referring to here: What common piece of data
>> stores the node of a particular CPU? cpu_to_node[] clearly is x86-
>> specific.
> 
> I meant that we have common interface (sysctl) for querying topology and 
> PCI is a common IO bus (unlike CPUs) so I figured I'd keep node in 
> common part as well.

Makes little sense: ARM stubs out cpu_to_node(), so a similar
mechanism could be used to hide the fact that some arch-es may
have no node associated with a PCI device.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Boris Ostrovsky

On 01/07/2015 10:06 AM, Jan Beulich wrote:



Of course an additional question would be whether the node wouldn't
better go into struct arch_pci_dev - that depends on whether we
expect ARM to be using NUMA...

Since we have CPU topology in common code I thought this would be
arch-independent as well.

Not sure what you're referring to here: What common piece of data
stores the node of a particular CPU? cpu_to_node[] clearly is x86-
specific.


I meant that we have common interface (sysctl) for querying topology and 
PCI is a common IO bus (unlike CPUs) so I figured I'd keep node in 
common part as well.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Boris Ostrovsky

On 01/07/2015 10:07 AM, Jan Beulich wrote:

On 07.01.15 at 15:47,  wrote:

On 07/01/15 14:42, Boris Ostrovsky wrote:

I kept this field as an int to be able to store NUMA_NO_NODE which I
thought to be (int)-1.

But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
(int)-1 by pxm_to_node(). Given that there is a number of tests for
NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
return u8 as well?

I noticed this as well, and found it quite counter intuitive.

I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
type-punning.

I have to admit that I see no value in wasting 4 bytes for something
that for the foreseeable future won't exceed 1 byte.


The downside of going to u8 is that we'd be limiting number of nodes to 
254, which is somewhat awkward. OTOH we already do this by testing 
nodeID against 0xff in various places.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Jan Beulich
>>> On 07.01.15 at 15:47,  wrote:
> On 07/01/15 14:42, Boris Ostrovsky wrote:
>> I kept this field as an int to be able to store NUMA_NO_NODE which I
>> thought to be (int)-1.
>>
>> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
>> (int)-1 by pxm_to_node(). Given that there is a number of tests for
>> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
>> return u8 as well?
> 
> I noticed this as well, and found it quite counter intuitive.
> 
> I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
> type-punning.

I have to admit that I see no value in wasting 4 bytes for something
that for the foreseeable future won't exceed 1 byte.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Jan Beulich
>>> On 07.01.15 at 15:42,  wrote:
> On 01/07/2015 04:06 AM, Jan Beulich wrote:
> On 06.01.15 at 03:18,  wrote:
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -56,6 +56,8 @@ struct pci_dev {
>>>   
>>>   u8 phantom_stride;
>>>   
>>> +int node; /* NUMA node */
>> I thought I asked about this on v1 already: Does this really need to be
>> an int, when commonly node numbers are stored in u8/unsigned char?
>> Shrinking the field size would prevent the structure size from growing...
> 
> I kept this field as an int to be able to store NUMA_NO_NODE which I 
> thought to be (int)-1.
> 
> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to 
> (int)-1 by pxm_to_node(). Given that there is a number of tests for 
> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() 
> return u8 as well?

I think that would make sense, together with fixing up one of the three
callers in VT-d code (from alloc_pgtable_maddr()); the other two look
correct already.

>> Of course an additional question would be whether the node wouldn't
>> better go into struct arch_pci_dev - that depends on whether we
>> expect ARM to be using NUMA...
> 
> Since we have CPU topology in common code I thought this would be 
> arch-independent as well.

Not sure what you're referring to here: What common piece of data
stores the node of a particular CPU? cpu_to_node[] clearly is x86-
specific.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Andrew Cooper
On 07/01/15 14:42, Boris Ostrovsky wrote:
> On 01/07/2015 04:06 AM, Jan Beulich wrote:
> On 06.01.15 at 03:18,  wrote:
>>> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd,
>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>   }
>>>   else
>>>   pdev_info.is_virtfn = 0;
>>> -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
>>> +
>>> +if ( add.flags & XEN_PCI_DEV_PXM )
>>> +{
>>> +uint32_t pxm;
>>> +int optarr_off = offsetof(struct
>>> physdev_pci_device_add, optarr) /
>> unsigned int or size_t.
>>
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -56,6 +56,8 @@ struct pci_dev {
>>> u8 phantom_stride;
>>>   +int node; /* NUMA node */
>> I thought I asked about this on v1 already: Does this really need to be
>> an int, when commonly node numbers are stored in u8/unsigned char?
>> Shrinking the field size would prevent the structure size from
>> growing...
>
>
> I kept this field as an int to be able to store NUMA_NO_NODE which I
> thought to be (int)-1.
>
> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
> (int)-1 by pxm_to_node(). Given that there is a number of tests for
> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
> return u8 as well?

I noticed this as well, and found it quite counter intuitive.

I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
type-punning.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Boris Ostrovsky

On 01/07/2015 04:06 AM, Jan Beulich wrote:

On 06.01.15 at 03:18,  wrote:

@@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
  }
  else
  pdev_info.is_virtfn = 0;
-ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
+
+if ( add.flags & XEN_PCI_DEV_PXM )
+{
+uint32_t pxm;
+int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /

unsigned int or size_t.


--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -56,6 +56,8 @@ struct pci_dev {
  
  u8 phantom_stride;
  
+int node; /* NUMA node */

I thought I asked about this on v1 already: Does this really need to be
an int, when commonly node numbers are stored in u8/unsigned char?
Shrinking the field size would prevent the structure size from growing...



I kept this field as an int to be able to store NUMA_NO_NODE which I 
thought to be (int)-1.


But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to 
(int)-1 by pxm_to_node(). Given that there is a number of tests for 
NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() 
return u8 as well?




Of course an additional question would be whether the node wouldn't
better go into struct arch_pci_dev - that depends on whether we
expect ARM to be using NUMA...



Since we have CPU topology in common code I thought this would be 
arch-independent as well.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Jan Beulich
>>> On 06.01.15 at 03:18,  wrote:
> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  }
>  else
>  pdev_info.is_virtfn = 0;
> -ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
> +
> +if ( add.flags & XEN_PCI_DEV_PXM )
> +{
> +uint32_t pxm;
> +int optarr_off = offsetof(struct physdev_pci_device_add, optarr) 
> /

unsigned int or size_t.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -56,6 +56,8 @@ struct pci_dev {
>  
>  u8 phantom_stride;
>  
> +int node; /* NUMA node */

I thought I asked about this on v1 already: Does this really need to be
an int, when commonly node numbers are stored in u8/unsigned char?
Shrinking the field size would prevent the structure size from growing...

Of course an additional question would be whether the node wouldn't
better go into struct arch_pci_dev - that depends on whether we
expect ARM to be using NUMA...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-07 Thread Jan Beulich
>>> On 06.01.15 at 12:55,  wrote:
> On 06/01/15 02:18, Boris Ostrovsky wrote:
> 
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index 5f295f3..a5eb81c 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -56,6 +56,8 @@ struct pci_dev {
>>  
>>  u8 phantom_stride;
>>  
>> +int node; /* NUMA node */
>> +
>>  enum pdev_type {
>>  DEV_TYPE_PCI_UNKNOWN,
>>  DEV_TYPE_PCIe_ENDPOINT,
>> @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
>>  int pci_release_devices(struct domain *d);
>>  int pci_add_segment(u16 seg);
>>  const unsigned long *pci_get_ro_map(u16 seg);
>> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
>> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> +   const struct pci_dev_info *, int);
> 
> Please use parameter names in definitions.

For the added parameter - yes. For the pre-existing pointer one I don't
see a strong need to do so (and there was no name there before).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-06 Thread Andrew Cooper
On 06/01/15 02:18, Boris Ostrovsky wrote:

> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 5f295f3..a5eb81c 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -56,6 +56,8 @@ struct pci_dev {
>  
>  u8 phantom_stride;
>  
> +int node; /* NUMA node */
> +
>  enum pdev_type {
>  DEV_TYPE_PCI_UNKNOWN,
>  DEV_TYPE_PCIe_ENDPOINT,
> @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
>  int pci_release_devices(struct domain *d);
>  int pci_add_segment(u16 seg);
>  const unsigned long *pci_get_ro_map(u16 seg);
> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
> +   const struct pci_dev_info *, int);

Please use parameter names in definitions.

~Andrew

>  int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>  int pci_ro_device(int seg, int bus, int devfn);
>  void arch_pci_ro_device(int seg, int bdf);



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/4] pci: Do not ignore device's PXM information

2015-01-05 Thread Boris Ostrovsky
If ACPI provides PXM data for IO devices then dom0 will pass it to
hypervisor during PHYSDEVOP_pci_device_add call. This information,
however, is currently ignored.

We will store this information (in the form of nodeID) in pci_dev
structure so that we can provide it, for example, to the toolstack
when it adds support (in the following patches) for querying the
hypervisor about device topology

We will also print it when user requests device information dump.

Signed-off-by: Boris Ostrovsky 
---
 xen/arch/x86/physdev.c|   23 ---
 xen/drivers/passthrough/pci.c |   13 +
 xen/include/public/physdev.h  |6 ++
 xen/include/xen/pci.h |5 -
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 6b3201b..62e768e 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
 break;
 
-ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
+ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
+ NULL, NUMA_NO_NODE);
 break;
 }
 
@@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
 ret = pci_add_device(0, manage_pci_ext.bus,
  manage_pci_ext.devfn,
- &pdev_info);
+ &pdev_info, NUMA_NO_NODE);
 break;
 }
 
 case PHYSDEVOP_pci_device_add: {
 struct physdev_pci_device_add add;
 struct pci_dev_info pdev_info;
+int node;
 
 ret = -EFAULT;
 if ( copy_from_guest(&add, arg, 1) != 0 )
@@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 }
 else
 pdev_info.is_virtfn = 0;
-ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
+
+if ( add.flags & XEN_PCI_DEV_PXM )
+{
+uint32_t pxm;
+int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
+ sizeof(add.optarr[0]);
+
+if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) )
+break;
+
+node = pxm_to_node(pxm);
+}
+else
+node = NUMA_NO_NODE;
+
+ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
 break;
 }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 78c6977..3002129 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
 pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+   const struct pci_dev_info *info, int node)
 {
 struct pci_seg *pseg;
 struct pci_dev *pdev;
@@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct 
pci_dev_info *info)
 pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
 spin_unlock(&pcidevs_lock);
 if ( !pdev )
-pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
+pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
+   NULL, node);
 pdev_type = "virtual function";
 }
 else
@@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct 
pci_dev_info *info)
 if ( !pdev )
 goto out;
 
+pdev->node = node;
+
 if ( info )
 pdev->info = *info;
 else if ( !pdev->vf_rlen[0] )
@@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void 
*arg)
 
 list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
 {
-printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
+printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
pseg->nr, pdev->bus,
PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-   pdev->domain ? pdev->domain->domain_id : -1);
+   pdev->domain ? pdev->domain->domain_id : -1,
+   (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
 list_for_each_entry ( msi, &pdev->msi_list, list )
printk("%d ", msi->irq);
 printk(">\n");
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index d547928..309346b 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -293,6 +293,12 @@ struct physdev_pci_device_add {
 uint8_t bus;
 uint8_t devfn;
 } physfn;
+
+/*
+ * Optional parameters array.
+ * First element ([0]) is PXM domain associa