Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-05 Thread Andrew Cooper
On 26/02/15 13:56, Jan Beulich wrote:
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
>  return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >> PAGE_SHIFT;
>  }
>  
> -static inline struct page_info* alloc_amd_iommu_pgtable(void)
> +static inline struct page_info *alloc_amd_iommu_pgtable(struct domain *d)
>  {
>  struct page_info *pg;
>  void *vaddr;
>  
> -pg = alloc_domheap_page(NULL, 0);
> +pg = alloc_domheap_page(d, MEMF_no_owner);

Same comment as with the VT-d side of things.  This should be based on
the proximity information of the IOMMU, not of the owning domain.

~Andrew

>  if ( pg == NULL )
>  return 0;
>  vaddr = __map_domain_page(pg);
>
>
>
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-05 Thread Jan Beulich
>>> On 05.03.15 at 18:30,  wrote:
> On 26/02/15 13:56, Jan Beulich wrote:
>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>> @@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
>>  return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >> PAGE_SHIFT;
>>  }
>>  
>> -static inline struct page_info* alloc_amd_iommu_pgtable(void)
>> +static inline struct page_info *alloc_amd_iommu_pgtable(struct domain *d)
>>  {
>>  struct page_info *pg;
>>  void *vaddr;
>>  
>> -pg = alloc_domheap_page(NULL, 0);
>> +pg = alloc_domheap_page(d, MEMF_no_owner);
> 
> Same comment as with the VT-d side of things.  This should be based on
> the proximity information of the IOMMU, not of the owning domain.

I think I buy this argument on the VT-d side (under the assumption
that there's going to be at least one IOMMU per node), but I'm not
sure here: The most modern AMD box I have has just a single
IOMMU for 4 nodes it reports.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-06 Thread Andrew Cooper
On 06/03/2015 07:50, Jan Beulich wrote:
 On 05.03.15 at 18:30,  wrote:
>> On 26/02/15 13:56, Jan Beulich wrote:
>>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>>> @@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
>>>  return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >> PAGE_SHIFT;
>>>  }
>>>  
>>> -static inline struct page_info* alloc_amd_iommu_pgtable(void)
>>> +static inline struct page_info *alloc_amd_iommu_pgtable(struct domain *d)
>>>  {
>>>  struct page_info *pg;
>>>  void *vaddr;
>>>  
>>> -pg = alloc_domheap_page(NULL, 0);
>>> +pg = alloc_domheap_page(d, MEMF_no_owner);
>> Same comment as with the VT-d side of things.  This should be based on
>> the proximity information of the IOMMU, not of the owning domain.
> I think I buy this argument on the VT-d side (under the assumption
> that there's going to be at least one IOMMU per node), but I'm not
> sure here: The most modern AMD box I have has just a single
> IOMMU for 4 nodes it reports.

It is not possible for an IOMMU to cover multiple NUMA nodes worth of
IO, because of the position it has to sit relative to the IO root ports
and QPI/HT links.

In AMD systems, the IOMMUs lives in the northbridges, meaning one per
numa node (as it is the northbridges which contain the hypertransport links)

The BIOS/firmware will only report IOMMUs from northbridges which have
IO connected to their IO hypertransport link (most systems in the wild
have all IO hanging off one or two Numa nodes).  On the other hand, I
have an AMD system with 8 IOMMUs in use.

In Intel systems, there is one IOMMU for each socket (to cover the
on-chip root ports and GPU if applicable) and one IOMMU in the IOH/PCH
(depending on generation) to cover the legacy IO.


In all cases, the IOMMUs are local to a single NUMA node, and would
benefit from having the control pages and pagetables allocated in local RAM.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-09 Thread Suravee Suthikulanit

On 3/6/2015 6:15 AM, Andrew Cooper wrote:

On 06/03/2015 07:50, Jan Beulich wrote:

On 05.03.15 at 18:30,  wrote:

On 26/02/15 13:56, Jan Beulich wrote:

--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
  return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >> PAGE_SHIFT;
  }

-static inline struct page_info* alloc_amd_iommu_pgtable(void)
+static inline struct page_info *alloc_amd_iommu_pgtable(struct domain *d)
  {
  struct page_info *pg;
  void *vaddr;

-pg = alloc_domheap_page(NULL, 0);
+pg = alloc_domheap_page(d, MEMF_no_owner);

Same comment as with the VT-d side of things.  This should be based on
the proximity information of the IOMMU, not of the owning domain.

I think I buy this argument on the VT-d side (under the assumption
that there's going to be at least one IOMMU per node), but I'm not
sure here: The most modern AMD box I have has just a single
IOMMU for 4 nodes it reports.


It is not possible for an IOMMU to cover multiple NUMA nodes worth of
IO, because of the position it has to sit relative to the IO root ports
and QPI/HT links.

In AMD systems, the IOMMUs lives in the northbridges, meaning one per
numa node (as it is the northbridges which contain the hypertransport links)

The BIOS/firmware will only report IOMMUs from northbridges which have
IO connected to their IO hypertransport link (most systems in the wild
have all IO hanging off one or two Numa nodes).  On the other hand, I
have an AMD system with 8 IOMMUs in use.



Actually, a single IOMMU could handle multiple nodes. For example, in 
scenario of a multi-chip-module (MCM) setup, there could be at least 2-4 
nodes sharing one IOMMU depending on how the platform vendor configuring 
the system. In the server platforms, IOMMU is in AMD northbridge 
chipsets (e.g. SR56xx). This website has an example of such system 
configuration 
(http://www.qdpma.com/systemarchitecture/SystemArchitecture_Opteron.html).


For AMD IOMMU, the IVRS table specifies the PCI bus/device ranges to be 
handled by each IOMMU. This is probably should be considered here.




In Intel systems, there is one IOMMU for each socket (to cover the
on-chip root ports and GPU if applicable) and one IOMMU in the IOH/PCH
(depending on generation) to cover the legacy IO.


In all cases, the IOMMUs are local to a single NUMA node, and would
benefit from having the control pages and pagetables allocated in local RAM.



As state above, this is not the case for AMD IOMMU.

Thanks,

Suravee

~Andrew





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-09 Thread Andrew Cooper
On 09/03/15 15:42, Suravee Suthikulanit wrote:
> On 3/6/2015 6:15 AM, Andrew Cooper wrote:
>> On 06/03/2015 07:50, Jan Beulich wrote:
>> On 05.03.15 at 18:30,  wrote:
 On 26/02/15 13:56, Jan Beulich wrote:
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
>   return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >>
> PAGE_SHIFT;
>   }
>
> -static inline struct page_info* alloc_amd_iommu_pgtable(void)
> +static inline struct page_info *alloc_amd_iommu_pgtable(struct
> domain *d)
>   {
>   struct page_info *pg;
>   void *vaddr;
>
> -pg = alloc_domheap_page(NULL, 0);
> +pg = alloc_domheap_page(d, MEMF_no_owner);
 Same comment as with the VT-d side of things.  This should be based on
 the proximity information of the IOMMU, not of the owning domain.
>>> I think I buy this argument on the VT-d side (under the assumption
>>> that there's going to be at least one IOMMU per node), but I'm not
>>> sure here: The most modern AMD box I have has just a single
>>> IOMMU for 4 nodes it reports.
>>
>> It is not possible for an IOMMU to cover multiple NUMA nodes worth of
>> IO, because of the position it has to sit relative to the IO root ports
>> and QPI/HT links.
>>
>> In AMD systems, the IOMMUs lives in the northbridges, meaning one per
>> numa node (as it is the northbridges which contain the hypertransport
>> links)
>>
>> The BIOS/firmware will only report IOMMUs from northbridges which have
>> IO connected to their IO hypertransport link (most systems in the wild
>> have all IO hanging off one or two Numa nodes).  On the other hand, I
>> have an AMD system with 8 IOMMUs in use.
>
>
> Actually, a single IOMMU could handle multiple nodes. For example, in
> scenario of a multi-chip-module (MCM) setup, there could be at least
> 2-4 nodes sharing one IOMMU depending on how the platform vendor
> configuring the system. In the server platforms, IOMMU is in AMD
> northbridge chipsets (e.g. SR56xx). This website has an example of
> such system configuration
> (http://www.qdpma.com/systemarchitecture/SystemArchitecture_Opteron.html).

Ok - I was basing my example on the last layout I had the manual for,
which I believe was Bulldozer.

However, my point still stands that there is an IOMMU between any IO and
RAM.  An individual IOMMU will always benefit from having its
iopagetables on the local numa node, rather than the numa node(s) which
the domain owning the device is running on.

>
> For AMD IOMMU, the IVRS table specifies the PCI bus/device ranges to
> be handled by each IOMMU. This is probably should be considered here.

Presumably a PCI transaction must never get onto the HT bus without
having already undergone translation, or there can be no guarantee that
it would be routed via the IOMMU?  Or are you saying that there are
cases where a transaction will enter the HT bus, route sideways to an
IOMMU, undergo translation, then route back onto the HT bus to the
target RAM/processor?

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-09 Thread Suravee Suthikulanit

On 3/9/2015 12:26 PM, Andrew Cooper wrote:

On 09/03/15 15:42, Suravee Suthikulanit wrote:

On 3/6/2015 6:15 AM, Andrew Cooper wrote:

On 06/03/2015 07:50, Jan Beulich wrote:

On 05.03.15 at 18:30,  wrote:

On 26/02/15 13:56, Jan Beulich wrote:

--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
   return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >>
PAGE_SHIFT;
   }

-static inline struct page_info* alloc_amd_iommu_pgtable(void)
+static inline struct page_info *alloc_amd_iommu_pgtable(struct
domain *d)
   {
   struct page_info *pg;
   void *vaddr;

-pg = alloc_domheap_page(NULL, 0);
+pg = alloc_domheap_page(d, MEMF_no_owner);

Same comment as with the VT-d side of things.  This should be based on
the proximity information of the IOMMU, not of the owning domain.

I think I buy this argument on the VT-d side (under the assumption
that there's going to be at least one IOMMU per node), but I'm not
sure here: The most modern AMD box I have has just a single
IOMMU for 4 nodes it reports.


It is not possible for an IOMMU to cover multiple NUMA nodes worth of
IO, because of the position it has to sit relative to the IO root ports
and QPI/HT links.

In AMD systems, the IOMMUs lives in the northbridges, meaning one per
numa node (as it is the northbridges which contain the hypertransport
links)

The BIOS/firmware will only report IOMMUs from northbridges which have
IO connected to their IO hypertransport link (most systems in the wild
have all IO hanging off one or two Numa nodes).  On the other hand, I
have an AMD system with 8 IOMMUs in use.



Actually, a single IOMMU could handle multiple nodes. For example, in
scenario of a multi-chip-module (MCM) setup, there could be at least
2-4 nodes sharing one IOMMU depending on how the platform vendor
configuring the system. In the server platforms, IOMMU is in AMD
northbridge chipsets (e.g. SR56xx). This website has an example of
such system configuration
(http://www.qdpma.com/systemarchitecture/SystemArchitecture_Opteron.html).


Ok - I was basing my example on the last layout I had the manual for,
which I believe was Bulldozer.

However, my point still stands that there is an IOMMU between any IO and
RAM.  An individual IOMMU will always benefit from having its
iopagetables on the local numa node, rather than the numa node(s) which
the domain owning the device is running on.



I agree that having the IO page tables on the NUMA node that is closest 
to the IOMMU would be beneficial.  However, I am not sure at the moment 
that this information could be easily determined. I think ACPI _PXM for 
devices should be able to provide this information, but this is optional 
and often not available.




For AMD IOMMU, the IVRS table specifies the PCI bus/device ranges to
be handled by each IOMMU. This is probably should be considered here.


Presumably a PCI transaction must never get onto the HT bus without
having already undergone translation, or there can be no guarantee that
it would be routed via the IOMMU?  Or are you saying that there are
cases where a transaction will enter the HT bus, route sideways to an
IOMMU, undergo translation, then route back onto the HT bus to the
target RAM/processor?

~Andrew



IOMMU sits between PCI devices (downstream) and HT (uptream), all DMA 
transactions from downstream must go through IOMMU. On the other hand, 
the I/O page translation is handled by IOMMU, and it is a separate 
traffic than the downstream device DMA transactions.


Suravee


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-10 Thread Jan Beulich
>>> On 09.03.15 at 20:02,  wrote:
> I agree that having the IO page tables on the NUMA node that is closest 
> to the IOMMU would be beneficial.

And I already withdrew this patch and the corresponding VT-d one.

> However, I am not sure at the moment 
> that this information could be easily determined. I think ACPI _PXM for 
> devices should be able to provide this information, but this is optional 
> and often not available.

And even if it was available, it would be too late at least for Dom0's
allocations (as it requires Dom0's interpreter to dig out this detail).
The best we could do in that case would be to try to replace the
existing tables. Or assume Dom0 is being placed suitably by the
dom0_nodes= option. Or add yet another option.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from

2015-03-10 Thread Boris Ostrovsky

On 03/10/2015 03:35 AM, Jan Beulich wrote:

On 09.03.15 at 20:02,  wrote:

I agree that having the IO page tables on the NUMA node that is closest
to the IOMMU would be beneficial.

And I already withdrew this patch and the corresponding VT-d one.


However, I am not sure at the moment
that this information could be easily determined. I think ACPI _PXM for
devices should be able to provide this information, but this is optional
and often not available.

And even if it was available, it would be too late at least for Dom0's
allocations (as it requires Dom0's interpreter to dig out this detail).
The best we could do in that case would be to try to replace the
existing tables. Or assume Dom0 is being placed suitably by the
dom0_nodes= option. Or add yet another option.


There is a nodeID register on each northbridge (D1F0x60). You would 
have to figure out how to map it to _PXMs though.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel