from:"Woods, Brian"

[Xen-devel] [PATCH] MAINTAINERS: remove myself from SVM and AMD IOMMU

2019-08-15 Thread Woods, Brian

From: Brian Woods 

I will no longer be working at AMD and am removing myself.

Signed-off-by: Brian Woods 
---
 MAINTAINERS | 2 --
 1 file changed, 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 77413e0..251bfe2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -146,14 +146,12 @@ F:tools/libacpi/
 
 AMD IOMMU
 M: Suravee Suthikulpanit 
-R: Brian Woods 
 S: Maintained
 F: xen/drivers/passthrough/amd/
 
 AMD SVM
 M: Boris Ostrovsky 
 M: Suravee Suthikulpanit 
-R: Brian Woods 
 S: Supported
 F: xen/arch/x86/hvm/svm/
 F: xen/arch/x86/cpu/vpmu_amd.c
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 00/10] AMD IOMMU: further improvements

2019-08-15 Thread Woods, Brian

On Tue, Aug 06, 2019 at 03:05:36PM +0200, Jan Beulich wrote:
> Only the first patch here is left from v4, everything else is new,
> yet still related. The main goal is to reduce the huge memory
> overhead that we've noticed. On the way there a number of other
> things were once again noticed. Unfortunately before I was able to
> also test the last two patches there, my Rome box broke again.
> Hence these two patches have been tested on a (less affected)
> Fam15 system only.
> 
> 01: miscellaneous DTE handling adjustments
> 02: drop stray "else"
> 03: don't free shared IRT multiple times
> 04: introduce a "valid" flag for IVRS mappings
> 05: let callers of amd_iommu_alloc_intremap_table() handle errors
> 06: don't blindly allocate interrupt remapping tables
> 07: make phantom functions share interrupt remapping tables
> 08: x86/PCI: read MSI-X table entry count early
> 09: replace INTREMAP_ENTRIES
> 10: restrict interrupt remapping table sizes
> 
> Full set of patches once again attached here due to still unresolved
> email issues over here.
> 
> Jan
> 

I don't think I have enough time here left to review these, but I've
tested them via PCI device passthrough on an AMD Rome system.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/AMD-Vi: Fold exit paths of {enable, disable}_iommu()

2019-08-12 Thread Woods, Brian

On Mon, Aug 12, 2019 at 06:52:05PM +0100, Andy Cooper wrote:
> ... to avoid having multiple spin_unlock_irqrestore() calls.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> 
> Interestingly GCC 6.3 managed to fold disable_iommu() automatically.  There is
> some partial folding for enable_iommu() (insofar as there is only a single
> call to _spin_unlock_irqrestore emitted), but this delta yeilds
> 
>   add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-20 (-20)
>   Function old new   delta
>   enable_iommu18441824 -20
>   Total: Before=3340299, After=3340279, chg -0.00%
> 
> which means that something wasn't done automatically.
> 
> Noticed while investigating the S3 regression.
> ---
>  xen/drivers/passthrough/amd/iommu_init.c | 17 +++--
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_init.c 
> b/xen/drivers/passthrough/amd/iommu_init.c
> index bb9f33e264..bb5a3e57c9 100644
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -899,11 +899,8 @@ static void enable_iommu(struct amd_iommu *iommu)
>  
>  spin_lock_irqsave(>lock, flags);
>  
> -if ( iommu->enabled )
> -{
> -spin_unlock_irqrestore(>lock, flags); 
> -return;
> -}
> +if ( unlikely(iommu->enabled) )
> +goto out;
>  
>  amd_iommu_erratum_746_workaround(iommu);
>  
> @@ -957,6 +954,8 @@ static void enable_iommu(struct amd_iommu *iommu)
>  amd_iommu_flush_all_caches(iommu);
>  
>  iommu->enabled = 1;
> +
> + out:
>  spin_unlock_irqrestore(>lock, flags);
>  }
>  
> @@ -966,11 +965,8 @@ static void disable_iommu(struct amd_iommu *iommu)
>  
>  spin_lock_irqsave(>lock, flags);
>  
> -if ( !iommu->enabled )
> -{
> -spin_unlock_irqrestore(>lock, flags);
> -return;
> -}
> +if ( unlikely(!iommu->enabled) )
> +goto out;
>  
>  if ( !iommu->ctrl.int_cap_xt_en )
>  amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
> @@ -988,6 +984,7 @@ static void disable_iommu(struct amd_iommu *iommu)
>  
>  iommu->enabled = 0;
>  
> + out:
>  spin_unlock_irqrestore(>lock, flags);
>  }
>  
> -- 
> 2.11.0
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 02/10] AMD/IOMMU: drop stray "else"

2019-08-08 Thread Woods, Brian

On Tue, Aug 06, 2019 at 03:08:11PM +0200, Jan Beulich wrote:
> The blank line between it and the prior if() clearly indicates that this
> was meant to be a standalone if().
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v5: New.
> 
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -166,8 +166,8 @@ static int __init iov_detect(void)
>  if ( !iommu_enable && !iommu_intremap )
>  return 0;
> -else if ( (init_done ? amd_iommu_init_interrupt()
> - : amd_iommu_init(false)) != 0 )
> +if ( (init_done ? amd_iommu_init_interrupt()
> +: amd_iommu_init(false)) != 0 )
>  {
>  printk("AMD-Vi: Error initialization\n");
>  return -ENODEV;
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 01/10] AMD/IOMMU: miscellaneous DTE handling adjustments

2019-08-06 Thread Woods, Brian

On Tue, Aug 06, 2019 at 03:07:48PM +0200, Jan Beulich wrote:
> First and foremost switch boolean fields to bool. Adjust a few related
> function parameters as well. Then
> - in amd_iommu_set_intremap_table() don't use literal numbers,
> - in iommu_dte_add_device_entry() use a compound literal instead of many
>   assignments,
> - in amd_iommu_setup_domain_device()
>   - eliminate a pointless local variable,
>   - use || instead of && when deciding whether to clear an entry,
>   - clear the I field without any checking of ATS / IOTLB state,
> - leave reserved fields unnamed.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Ignore my ack on the old patch that was part of the other series (was
still catching).

Acked-by: Brian Woods 

> ---
> v5: IOMMU_INTREMAP_LENGTH -> IOMMU_INTREMAP_ORDER. Adjust comment.
> v4: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -69,8 +69,7 @@ union irte_cptr {
>  const union irte128 *ptr128;
>  } __transparent__;
> -#define INTREMAP_LENGTH 0xB
> -#define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
> +#define INTREMAP_ENTRIES (1 << IOMMU_INTREMAP_ORDER)
>  struct ioapic_sbdf ioapic_sbdf[MAX_IO_APICS];
>  struct hpet_sbdf hpet_sbdf;
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -101,51 +101,52 @@ static unsigned int set_iommu_pte_presen
>  void amd_iommu_set_root_page_table(struct amd_iommu_dte *dte,
> uint64_t root_ptr, uint16_t domain_id,
> -   uint8_t paging_mode, uint8_t valid)
> +   uint8_t paging_mode, bool valid)
>  {
>  dte->domain_id = domain_id;
>  dte->pt_root = paddr_to_pfn(root_ptr);
> -dte->iw = 1;
> -dte->ir = 1;
> +dte->iw = true;
> +dte->ir = true;
>  dte->paging_mode = paging_mode;
> -dte->tv = 1;
> +dte->tv = true;
>  dte->v = valid;
>  }
>  void __init amd_iommu_set_intremap_table(
> -struct amd_iommu_dte *dte, uint64_t intremap_ptr, uint8_t int_valid)
> +struct amd_iommu_dte *dte, uint64_t intremap_ptr, bool valid)
>  {
>  dte->it_root = intremap_ptr >> 6;
> -dte->int_tab_len = 0xb; /* 2048 entries */
> -dte->int_ctl = 2; /* fixed and arbitrated interrupts remapped */
> -dte->ig = 0; /* unmapped interrupt results io page faults */
> -dte->iv = int_valid;
> +dte->int_tab_len = IOMMU_INTREMAP_ORDER;
> +dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> +dte->ig = false; /* unmapped interrupts result in i/o page faults */
> +dte->iv = valid;
>  }
>  void __init iommu_dte_add_device_entry(struct amd_iommu_dte *dte,
> -   struct ivrs_mappings *ivrs_dev)
> +   const struct ivrs_mappings *ivrs_dev)
>  {
>  uint8_t flags = ivrs_dev->device_flags;
> -memset(dte, 0, sizeof(*dte));
> -
> -dte->init_pass = MASK_EXTR(flags, ACPI_IVHD_INIT_PASS);
> -dte->ext_int_pass = MASK_EXTR(flags, ACPI_IVHD_EINT_PASS);
> -dte->nmi_pass = MASK_EXTR(flags, ACPI_IVHD_NMI_PASS);
> -dte->lint0_pass = MASK_EXTR(flags, ACPI_IVHD_LINT0_PASS);
> -dte->lint1_pass = MASK_EXTR(flags, ACPI_IVHD_LINT1_PASS);
> -dte->sys_mgt = MASK_EXTR(flags, ACPI_IVHD_SYSTEM_MGMT);
> -dte->ex = ivrs_dev->dte_allow_exclusion;
> +*dte = (struct amd_iommu_dte){
> +.init_pass = flags & ACPI_IVHD_INIT_PASS,
> +.ext_int_pass = flags & ACPI_IVHD_EINT_PASS,
> +.nmi_pass = flags & ACPI_IVHD_NMI_PASS,
> +.lint0_pass = flags & ACPI_IVHD_LINT0_PASS,
> +.lint1_pass = flags & ACPI_IVHD_LINT1_PASS,
> +.ioctl = IOMMU_DEV_TABLE_IO_CONTROL_ABORTED,
> +.sys_mgt = MASK_EXTR(flags, ACPI_IVHD_SYSTEM_MGMT),
> +.ex = ivrs_dev->dte_allow_exclusion,
> +};
>  }
>  void iommu_dte_set_guest_cr3(struct amd_iommu_dte *dte, uint16_t dom_id,
> - uint64_t gcr3_mfn, uint8_t gv, uint8_t glx)
> + uint64_t gcr3_mfn, bool gv, uint8_t glx)
>  {
>  #define GCR3_MASK(hi, lo) (((1ul << ((hi) + 1)) - 1) & ~((1ul << (lo)) - 1))
>  #define GCR3_SHIFT(lo) ((lo) - PAGE_SHIFT)
>  /* I bit must be set when gcr3 is enabled */
> -dte->i = 1;
> +dte->i = true;
>  dte->gcr3_trp_14_12 = (gcr3_mfn & GCR3_MASK(14, 12)) >> GCR3_SHIFT(12);
>  dte->gcr3_trp_30_15 = (gcr3_mfn & GCR3_MASK(30, 15)) >> GCR3_SHIFT(15);
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -93,7 +93,6 @@ static void amd_iommu_setup_domain_devic
>  struct amd_iommu_dte *table, *dte;
>  unsigned long flags;
>  int req_id, valid = 1;
> -int dte_i = 0;
>  u8 bus = pdev->bus;
>  const struct domain_iommu *hd = dom_iommu(domain);
> @@ -103,9 +102,6 @@ static void amd_iommu_setup_domain_devic
>  if ( iommu_hwdom_passthrough && is_hardware_domain(domain)

Re: [Xen-devel] [PATCH v4 12/12] AMD/IOMMU: miscellaneous DTE handling adjustments

2019-08-06 Thread Woods, Brian

On Thu, Jul 25, 2019 at 01:33:50PM +, Jan Beulich wrote:
> First and foremost switch boolean fields to bool. Adjust a few related
> function parameters as well. Then
> - in amd_iommu_set_intremap_table() don't use literal numbers,
> - in iommu_dte_add_device_entry() use a compound literal instead of many
>assignments,
> - in amd_iommu_setup_domain_device()
>- eliminate a pointless local variable,
>- use || instead of && when deciding whether to clear an entry,
>- clear the I field without any checking of ATS / IOTLB state,
> - leave reserved fields unnamed.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v4: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -69,8 +69,7 @@ union irte_cptr {
>   const union irte128 *ptr128;
>   } __transparent__;
>   
> -#define INTREMAP_LENGTH 0xB
> -#define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
> +#define INTREMAP_ENTRIES (1 << IOMMU_INTREMAP_LENGTH)
>   
>   struct ioapic_sbdf ioapic_sbdf[MAX_IO_APICS];
>   struct hpet_sbdf hpet_sbdf;
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -101,51 +101,52 @@ static unsigned int set_iommu_pte_presen
>   
>   void amd_iommu_set_root_page_table(struct amd_iommu_dte *dte,
>  uint64_t root_ptr, uint16_t domain_id,
> -   uint8_t paging_mode, uint8_t valid)
> +   uint8_t paging_mode, bool valid)
>   {
>   dte->domain_id = domain_id;
>   dte->pt_root = paddr_to_pfn(root_ptr);
> -dte->iw = 1;
> -dte->ir = 1;
> +dte->iw = true;
> +dte->ir = true;
>   dte->paging_mode = paging_mode;
> -dte->tv = 1;
> +dte->tv = true;
>   dte->v = valid;
>   }
>   
>   void __init amd_iommu_set_intremap_table(
> -struct amd_iommu_dte *dte, uint64_t intremap_ptr, uint8_t int_valid)
> +struct amd_iommu_dte *dte, uint64_t intremap_ptr, bool valid)
>   {
>   dte->it_root = intremap_ptr >> 6;
> -dte->int_tab_len = 0xb; /* 2048 entries */
> -dte->int_ctl = 2; /* fixed and arbitrated interrupts remapped */
> -dte->ig = 0; /* unmapped interrupt results io page faults */
> -dte->iv = int_valid;
> +dte->int_tab_len = IOMMU_INTREMAP_LENGTH;
> +dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> +dte->ig = false; /* unmapped interrupts result in i/o page faults */
> +dte->iv = valid;
>   }
>   
>   void __init iommu_dte_add_device_entry(struct amd_iommu_dte *dte,
> -   struct ivrs_mappings *ivrs_dev)
> +   const struct ivrs_mappings *ivrs_dev)
>   {
>   uint8_t flags = ivrs_dev->device_flags;
>   
> -memset(dte, 0, sizeof(*dte));
> -
> -dte->init_pass = MASK_EXTR(flags, ACPI_IVHD_INIT_PASS);
> -dte->ext_int_pass = MASK_EXTR(flags, ACPI_IVHD_EINT_PASS);
> -dte->nmi_pass = MASK_EXTR(flags, ACPI_IVHD_NMI_PASS);
> -dte->lint0_pass = MASK_EXTR(flags, ACPI_IVHD_LINT0_PASS);
> -dte->lint1_pass = MASK_EXTR(flags, ACPI_IVHD_LINT1_PASS);
> -dte->sys_mgt = MASK_EXTR(flags, ACPI_IVHD_SYSTEM_MGMT);
> -dte->ex = ivrs_dev->dte_allow_exclusion;
> +*dte = (struct amd_iommu_dte){
> +.init_pass = flags & ACPI_IVHD_INIT_PASS,
> +.ext_int_pass = flags & ACPI_IVHD_EINT_PASS,
> +.nmi_pass = flags & ACPI_IVHD_NMI_PASS,
> +.lint0_pass = flags & ACPI_IVHD_LINT0_PASS,
> +.lint1_pass = flags & ACPI_IVHD_LINT1_PASS,
> +.ioctl = IOMMU_DEV_TABLE_IO_CONTROL_ABORTED,
> +.sys_mgt = MASK_EXTR(flags, ACPI_IVHD_SYSTEM_MGMT),
> +.ex = ivrs_dev->dte_allow_exclusion,
> +};
>   }
>   
>   void iommu_dte_set_guest_cr3(struct amd_iommu_dte *dte, uint16_t dom_id,
> - uint64_t gcr3_mfn, uint8_t gv, uint8_t glx)
> + uint64_t gcr3_mfn, bool gv, uint8_t glx)
>   {
>   #define GCR3_MASK(hi, lo) (((1ul << ((hi) + 1)) - 1) & ~((1ul << (lo)) - 1))
>   #define GCR3_SHIFT(lo) ((lo) - PAGE_SHIFT)
>   
>   /* I bit must be set when gcr3 is enabled */
> -dte->i = 1;
> +dte->i = true;
>   
>   dte->gcr3_trp_14_12 = (gcr3_mfn & GCR3_MASK(14, 12)) >> GCR3_SHIFT(12);
>   dte->gcr3_trp_30_15 = (gcr3_mfn & GCR3_MASK(30, 15)) >> GCR3_SHIFT(15);
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -93,7 +93,6 @@ static void amd_iommu_setup_domain_devic
>   struct amd_iommu_dte *table, *dte;
>   unsigned long flags;
>   int req_id, valid = 1;
> -int dte_i = 0;
>   u8 bus = pdev->bus;
>   const struct domain_iommu *hd = dom_iommu(domain);
>   
> @@ -103,9 +102,6 @@ static void amd_iommu_setup_domain_devic
>   if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
>   valid = 0;
>   
> -if ( ats_enabled )
> -dte_i = 1;
> -
>   /*

Re: [Xen-devel] [PATCH] passthrough/amd: Drop "IOMMU not found" message

2019-08-06 Thread Woods, Brian

On Mon, Aug 05, 2019 at 05:44:30PM +0100, Andy Cooper wrote:
> Since c/s 9fa94e10585 "x86/ACPI: also parse AMD IOMMU tables early", this
> function is unconditionally called in all cases where a DMAR ACPI table
> doesn't exist.
> 
> As a consequnce, "AMD-Vi: IOMMU not found!" is printed in all cases where an
> IOMMU isn't present, even on non-AMD systems.  Drop the message - it isn't
> terribly interesting anyway, and is now misleading is a number of common
> cases.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> ---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index b3e1933b53..3bcfcc8404 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -155,7 +155,6 @@ int __init acpi_ivrs_init(void)
>  
>  if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>  {
> -printk("AMD-Vi: IOMMU not found!\n");
>  iommu_intremap = 0;
>  return -ENODEV;
>  }
> -- 
> 2.11.0
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 11/12] AMD/IOMMU: don't needlessly log headers when dumping IRTs

2019-07-30 Thread Woods, Brian

On Thu, Jul 25, 2019 at 01:33:24PM +, Jan Beulich wrote:
> Log SBDF headers only when there are actual IRTEs to log. This is
> particularly important for the total volume of output when the ACPI
> tables describe far more than just the existing devices. On my Rome
> system so far there was one line for every function of every device on
> all 256 buses of segment 0, with extremely few exceptions (like the
> IOMMUs themselves).
> 
> Also only log one of the "per-device" or "shared" overall headers.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v4: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -883,7 +883,8 @@ int __init amd_setup_hpet_msi(struct msi
>   }
>   
>   static void dump_intremap_table(const struct amd_iommu *iommu,
> -union irte_cptr tbl)
> +union irte_cptr tbl,
> +const struct ivrs_mappings *ivrs_mapping)
>   {
>   unsigned int count;
>   
> @@ -892,19 +893,25 @@ static void dump_intremap_table(const st
>   
>   for ( count = 0; count < INTREMAP_ENTRIES; count++ )
>   {
> -if ( iommu->ctrl.ga_en )
> -{
> -if ( !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1] )
> +if ( iommu->ctrl.ga_en
> + ? !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1]
> + : !tbl.ptr32[count].raw )
>   continue;
> +
> +if ( ivrs_mapping )
> +{
> +printk("  %04x:%02x:%02x:%u:\n", iommu->seg,
> +   PCI_BUS(ivrs_mapping->dte_requestor_id),
> +   PCI_SLOT(ivrs_mapping->dte_requestor_id),
> +   PCI_FUNC(ivrs_mapping->dte_requestor_id));
> +ivrs_mapping = NULL;
> +}
> +
> +if ( iommu->ctrl.ga_en )
>   printk("IRTE[%03x] %016lx_%016lx\n",
>  count, tbl.ptr128[count].raw[1], 
> tbl.ptr128[count].raw[0]);
> -}
>   else
> -{
> -if ( !tbl.ptr32[count].raw )
> -continue;
>   printk("IRTE[%03x] %08x\n", count, tbl.ptr32[count].raw);
> -}
>   }
>   }
>   
> @@ -916,13 +923,8 @@ static int dump_intremap_mapping(const s
>   if ( !ivrs_mapping )
>   return 0;
>   
> -printk("  %04x:%02x:%02x:%u:\n", iommu->seg,
> -   PCI_BUS(ivrs_mapping->dte_requestor_id),
> -   PCI_SLOT(ivrs_mapping->dte_requestor_id),
> -   PCI_FUNC(ivrs_mapping->dte_requestor_id));
> -
>   spin_lock_irqsave(&(ivrs_mapping->intremap_lock), flags);
> -dump_intremap_table(iommu, ivrs_mapping->intremap_table);
> +dump_intremap_table(iommu, ivrs_mapping->intremap_table, ivrs_mapping);
>   spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
>   
>   process_pending_softirqs();
> @@ -932,17 +934,22 @@ static int dump_intremap_mapping(const s
>   
>   static void dump_intremap_tables(unsigned char key)
>   {
> -unsigned long flags;
> -
> -printk("--- Dumping Per-dev IOMMU Interrupt Remapping Table ---\n");
> +if ( !shared_intremap_table )
> +{
> +printk("--- Dumping Per-dev IOMMU Interrupt Remapping Table ---\n");
>   
> -iterate_ivrs_entries(dump_intremap_mapping);
> +iterate_ivrs_entries(dump_intremap_mapping);
> +}
> +else
> +{
> +unsigned long flags;
>   
> -printk("--- Dumping Shared IOMMU Interrupt Remapping Table ---\n");
> +printk("--- Dumping Shared IOMMU Interrupt Remapping Table ---\n");
>   
> -spin_lock_irqsave(_intremap_lock, flags);
> -dump_intremap_table(list_first_entry(_iommu_head, struct amd_iommu,
> - list),
> -shared_intremap_table);
> -spin_unlock_irqrestore(_intremap_lock, flags);
> +spin_lock_irqsave(_intremap_lock, flags);
> +dump_intremap_table(list_first_entry(_iommu_head, struct 
> amd_iommu,
> + list),
> +shared_intremap_table, NULL);
> +spin_unlock_irqrestore(_intremap_lock, flags);
> +}
>   }
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 05/12] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format

2019-07-30 Thread Woods, Brian

On Thu, Jul 25, 2019 at 01:31:02PM +, Jan Beulich wrote:
> This is in preparation of actually enabling x2APIC mode, which requires
> this wider IRTE format to be used.
> 
> A specific remark regarding the first hunk changing
> amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
> i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
> tables when creating new one"). Other code introduced by that change has
> meanwhile disappeared or further changed, and I wonder if - rather than
> adding an x2apic_enabled check to the conditional - the bypass couldn't
> be deleted altogether. For now the goal is to affect the non-x2APIC
> paths as little as possible.
> 
> Take the liberty and use the new "fresh" flag to suppress an unneeded
> flush in update_intremap_entry_from_ioapic().
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v4: Re-base. Do away with standalone struct irte_full. Use smp_wmb().
> v3: Avoid unrelated type changes in update_intremap_entry_from_ioapic().
>  Drop irte_mode enum and variable. Convert INTREMAP_TABLE_ORDER into
>  a static helper. Comment barrier() uses. Switch boolean bitfields to
>  bool.
> v2: Add cast in get_full_dest(). Re-base over changes earlier in the
>  series. Don't use cmpxchg16b. Use barrier() instead of wmb().
> ---
> Note that AMD's doc says Lowest Priority ("Arbitrated" by their naming)
> mode is unavailable in x2APIC mode, but they've confirmed this to be a
> mistake on their part.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -39,12 +39,36 @@ union irte32 {
>   } flds;
>   };
>   
> +union irte128 {
> +uint64_t raw[2];
> +struct {
> +bool remap_en:1;
> +bool sup_io_pf:1;
> +unsigned int int_type:3;
> +bool rq_eoi:1;
> +bool dm:1;
> +bool guest_mode:1; /* MBZ */
> +unsigned int dest_lo:24;
> +unsigned int :32;
> +unsigned int vector:8;
> +unsigned int :24;
> +unsigned int :24;
> +unsigned int dest_hi:8;
> +} full;
> +};
> +
>   union irte_ptr {
>   void *ptr;
>   union irte32 *ptr32;
> +union irte128 *ptr128;
>   };
>   
> -#define INTREMAP_TABLE_ORDER1
> +union irte_cptr {
> +const void *ptr;
> +const union irte32 *ptr32;
> +const union irte128 *ptr128;
> +} __transparent__;
> +
>   #define INTREMAP_LENGTH 0xB
>   #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
>   
> @@ -57,6 +81,13 @@ unsigned int nr_ioapic_sbdf;
>   
>   static void dump_intremap_tables(unsigned char key);
>   
> +static unsigned int __init intremap_table_order(const struct amd_iommu 
> *iommu)
> +{
> +return iommu->ctrl.ga_en
> +   ? get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte128))
> +   : get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte32));
> +}
> +
>   unsigned int ioapic_id_to_index(unsigned int apic_id)
>   {
>   unsigned int idx;
> @@ -131,7 +162,10 @@ static union irte_ptr get_intremap_entry
>   
>   ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
>   
> -table.ptr32 += index;
> +if ( iommu->ctrl.ga_en )
> +table.ptr128 += index;
> +else
> +table.ptr32 += index;
>   
>   return table;
>   }
> @@ -141,7 +175,22 @@ static void free_intremap_entry(const st
>   {
>   union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>   
> -ACCESS_ONCE(entry.ptr32->raw) = 0;
> +if ( iommu->ctrl.ga_en )
> +{
> +ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
> +/*
> + * Low half (containing RemapEn) needs to be cleared first.  Note 
> that
> + * strictly speaking smp_wmb() isn't enough, as conceptually it 
> expands
> + * to just barrier() when !CONFIG_SMP.  But wmb() would be more than 
> we
> + * need, since the IOMMU is a cache-coherent entity on the bus.  And
> + * given that we don't allow CONFIG_SMP to be turned off, the SMP
> + * variant will do.
> + */
> +smp_wmb();
> +entry.ptr128->raw[1] = 0;
> +}
> +else
> +ACCESS_ONCE(entry.ptr32->raw) = 0;
>   
>   __clear_bit(index, get_ivrs_mappings(iommu->seg)[bdf].intremap_inuse);
>   }
> @@ -151,17 +200,44 @@ static void update_intremap_entry(const
> unsigned int vector, unsigned int 
> int_type,
> unsigned int dest_mode, unsigned int dest)
>   {
> -union irte32 irte = {
> -.flds = {
> -.remap_en = true,
> -.int_type = int_type,
> -.dm = dest_mode,
> -.dest = dest,
> -.vector = vector,
> -},
> -};
> +if ( iommu->ctrl.ga_en )
> +{
> +union irte128 irte = {
> +.full = {
> +.remap_en = true,
> +.int_type = int_type,
> +.dm = dest_mode,
> +.dest_lo =

Re: [Xen-devel] [PATCH v4 01/12] AMD/IOMMU: use bit field for extended feature register

2019-07-30 Thread Woods, Brian

On Thu, Jul 25, 2019 at 01:29:16PM +, Jan Beulich wrote:
> This also takes care of several of the shift values wrongly having been
> specified as hex rather than dec.
> 
> Take the opportunity and
> - replace a readl() pair by a single readq(),
> - add further fields.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> v4: Drop stray/leftover #undef.
> v3: Another attempt at deriving masks from bitfields, hopefully better
>  liked by clang (mine was fine even with the v2 variant).
> v2: Correct sats_sup position and name. Re-base over new earlier patch.
> 
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -60,49 +60,76 @@ static int __init get_iommu_capabilities
>   
>   void __init get_iommu_features(struct amd_iommu *iommu)
>   {
> -u32 low, high;
> -int i = 0 ;
>   const struct amd_iommu *first;
> -static const char *__initdata feature_str[] = {
> -"- Prefetch Pages Command",
> -"- Peripheral Page Service Request",
> -"- X2APIC Supported",
> -"- NX bit Supported",
> -"- Guest Translation",
> -"- Reserved bit [5]",
> -"- Invalidate All Command",
> -"- Guest APIC supported",
> -"- Hardware Error Registers",
> -"- Performance Counters",
> -NULL
> -};
> -
>   ASSERT( iommu->mmio_base );
>   
>   if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
>   {
> -iommu->features = 0;
> +iommu->features.raw = 0;
>   return;
>   }
>   
> -low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
> -high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
> -
> -iommu->features = ((u64)high << 32) | low;
> +iommu->features.raw =
> +readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
>   
>   /* Don't log the same set of features over and over. */
>   first = list_first_entry(_iommu_head, struct amd_iommu, list);
> -if ( iommu != first && iommu->features == first->features )
> +if ( iommu != first && iommu->features.raw == first->features.raw )
>   return;
>   
>   printk("AMD-Vi: IOMMU Extended Features:\n");
>   
> -while ( feature_str[i] )
> +#define FEAT(fld, str) do {\
> +if ( --((union amd_iommu_ext_features){}).flds.fld > 1 )   \
> +printk( "- " str ": %#x\n", iommu->features.flds.fld); \
> +else if ( iommu->features.flds.fld )   \
> +printk( "- " str "\n");\
> +} while ( false )
> +
> +FEAT(pref_sup,   "Prefetch Pages Command");
> +FEAT(ppr_sup,"Peripheral Page Service Request");
> +FEAT(xt_sup, "x2APIC");
> +FEAT(nx_sup, "NX bit");
> +FEAT(gappi_sup,  "Guest APIC Physical Processor Interrupt");
> +FEAT(ia_sup, "Invalidate All Command");
> +FEAT(ga_sup, "Guest APIC");
> +FEAT(he_sup, "Hardware Error Registers");
> +FEAT(pc_sup, "Performance Counters");
> +FEAT(hats,   "Host Address Translation Size");
> +
> +if ( iommu->features.flds.gt_sup )
>   {
> -if ( amd_iommu_has_feature(iommu, i) )
> -printk( " %s\n", feature_str[i]);
> -i++;
> +FEAT(gats,   "Guest Address Translation Size");
> +FEAT(glx_sup,"Guest CR3 Root Table Level");
> +FEAT(pas_max,"Maximum PASID");
>   }
> +
> +FEAT(smif_sup,   "SMI Filter Register");
> +FEAT(smif_rc,"SMI Filter Register Count");
> +FEAT(gam_sup,"Guest Virtual APIC Modes");
> +FEAT(dual_ppr_log_sup,   "Dual PPR Log");
> +FEAT(dual_event_log_sup, "Dual Event Log");
> +FEAT(sats_sup,   "Secure ATS");
> +FEAT(us_sup, "User / Supervisor Page Protection");
> +FEAT(dev_tbl_seg_sup,"Device Table Segmentation");
> +FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
> +FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
> +FEAT(marc_sup,   "Memory Access Routing and Control");
> +FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
> +FEAT(perf_opt_sup ,  "Performance Optimization");
> +FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
> +FEAT(gio_sup,"Guest I/O Protection");
> +FEAT(ha_sup, "Host Access");
> +FEAT(eph_sup,"Enhanced PPR Handling");
> +FEAT(attr_fw_sup,"Attribute Forward");
> +FEAT(hd_sup, "Host Dirty");
> +FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
> +FEAT(viommu_sup, "Virtualized IOMMU");
> +FEAT(vm_guard_io_sup,"VMGuard I/O Support");
> +FEAT(vm_table_size,  "VM Table Size");
> +FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
> +
>

Re: [Xen-devel] [PATCH v4 10/12] AMD/IOMMU: correct IRTE updating

2019-07-25 Thread Woods, Brian

On Thu, Jul 25, 2019 at 01:33:02PM +, Jan Beulich wrote:
> Flushing didn't get done along the lines of what the specification says.
> Mark entries to be updated as not remapped (which will result in
> interrupt requests to get target aborted, but the interrupts should be
> masked anyway at that point in time), issue the flush, and only then
> write the new entry.
> 
> In update_intremap_entry_from_msi_msg() also fold the duplicate initial
> lock determination and acquire into just a single instance.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> RFC: Putting the flush invocations in loops isn't overly nice, but I
>   don't think this can really be abused, since callers up the stack
>   hold further locks. Nevertheless I'd like to ask for better
>   suggestions.
> ---
> v4: Re-base.
> v3: Remove stale parts of description. Re-base.
> v2: Parts morphed into earlier patch.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -213,15 +213,13 @@ static void update_intremap_entry(const
>   },
>   };
>   
> -ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
> +ASSERT(!entry.ptr128->full.remap_en);
> +entry.ptr128->raw[1] = irte.raw[1];
>   /*
> - * Low half, in particular RemapEn, needs to be cleared first.  See
> + * High half needs to be set before low one (containing RemapEn).  
> See
>* comment in free_intremap_entry() regarding the choice of barrier.
>*/
>   smp_wmb();
> -entry.ptr128->raw[1] = irte.raw[1];
> -/* High half needs to be set before low one (containing RemapEn). */
> -smp_wmb();
>   ACCESS_ONCE(entry.ptr128->raw[0]) = irte.raw[0];
>   }
>   else
> @@ -296,6 +294,20 @@ static int update_intremap_entry_from_io
>   }
>   
>   entry = get_intremap_entry(iommu, req_id, offset);
> +
> +/* The RemapEn fields match for all formats. */
> +while ( iommu->enabled && entry.ptr32->flds.remap_en )
> +{
> +entry.ptr32->flds.remap_en = false;
> +spin_unlock(lock);
> +
> +spin_lock(>lock);
> +amd_iommu_flush_intremap(iommu, req_id);
> +spin_unlock(>lock);
> +
> +spin_lock(lock);
> +}
> +
>   if ( fresh )
>   /* nothing */;
>   else if ( !lo_update )
> @@ -325,13 +337,6 @@ static int update_intremap_entry_from_io
>   
>   spin_unlock_irqrestore(lock, flags);
>   
> -if ( iommu->enabled && !fresh )
> -{
> -spin_lock_irqsave(>lock, flags);
> -amd_iommu_flush_intremap(iommu, req_id);
> -spin_unlock_irqrestore(>lock, flags);
> -}
> -
>   set_rte_index(rte, offset);
>   
>   return 0;
> @@ -587,19 +592,27 @@ static int update_intremap_entry_from_ms
>   req_id = get_dma_requestor_id(iommu->seg, bdf);
>   alias_id = get_intremap_requestor_id(iommu->seg, bdf);
>   
> +lock = get_intremap_lock(iommu->seg, req_id);
> +spin_lock_irqsave(lock, flags);
> +
>   if ( msg == NULL )
>   {
> -lock = get_intremap_lock(iommu->seg, req_id);
> -spin_lock_irqsave(lock, flags);
>   for ( i = 0; i < nr; ++i )
>   free_intremap_entry(iommu, req_id, *remap_index + i);
>   spin_unlock_irqrestore(lock, flags);
> -goto done;
> -}
>   
> -lock = get_intremap_lock(iommu->seg, req_id);
> +if ( iommu->enabled )
> +{
> +spin_lock_irqsave(>lock, flags);
> +amd_iommu_flush_intremap(iommu, req_id);
> +if ( alias_id != req_id )
> +amd_iommu_flush_intremap(iommu, alias_id);
> +spin_unlock_irqrestore(>lock, flags);
> +}
> +
> +return 0;
> +}
>   
> -spin_lock_irqsave(lock, flags);
>   dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
>   delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
>   vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
> @@ -623,6 +636,22 @@ static int update_intremap_entry_from_ms
>   }
>   
>   entry = get_intremap_entry(iommu, req_id, offset);
> +
> +/* The RemapEn fields match for all formats. */
> +while ( iommu->enabled && entry.ptr32->flds.remap_en )
> +{
> +entry.ptr32->flds.remap_en = false;
> +spin_unlock(lock);
> +
> +spin_lock(>lock);
> +amd_iommu_flush_intremap(iommu, req_id);
> +if ( alias_id != req_id )
> +amd_iommu_flush_intremap(iommu, alias_id);
> +spin_unlock(>lock);
> +
> +spin_lock(lock);
> +}
> +
>   update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, 
> dest);
>   spin_unlock_irqrestore(lock, flags);
>   
> @@ -642,16 +671,6 @@ static int update_intremap_entry_from_ms
>  get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
>   }
>   
> -done:
> -

Re: [Xen-devel] [PATCH v4 06/13] x86/IOMMU: don't restrict IRQ affinities to online CPUs

2019-07-24 Thread Woods, Brian

On Tue, Jul 16, 2019 at 07:40:57AM +, Jan Beulich wrote:
> In line with "x86/IRQ: desc->affinity should strictly represent the
> requested value" the internally used IRQ(s) also shouldn't be restricted
> to online ones. Make set_desc_affinity() (set_msi_affinity() then does
> by implication) cope with a NULL mask being passed (just like
> assign_irq_vector() does), and have IOMMU code pass NULL instead of
> _online_map (when, for VT-d, there's no NUMA node information
> available).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v4: New.
> 
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -796,18 +796,26 @@ unsigned int set_desc_affinity(struct ir
>   unsigned long flags;
>   cpumask_t dest_mask;
>   
> -if (!cpumask_intersects(mask, _online_map))
> +if ( mask && !cpumask_intersects(mask, _online_map) )
>   return BAD_APICID;
>   
>   spin_lock_irqsave(_lock, flags);
> -ret = _assign_irq_vector(desc, mask);
> +ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
>   spin_unlock_irqrestore(_lock, flags);
>   
> -if (ret < 0)
> +if ( ret < 0 )
>   return BAD_APICID;
>   
> -cpumask_copy(desc->affinity, mask);
> -cpumask_and(_mask, mask, desc->arch.cpu_mask);
> +if ( mask )
> +{
> +cpumask_copy(desc->affinity, mask);
> +cpumask_and(_mask, mask, desc->arch.cpu_mask);
> +}
> +else
> +{
> +cpumask_setall(desc->affinity);
> +cpumask_copy(_mask, desc->arch.cpu_mask);
> +}
>   cpumask_and(_mask, _mask, _online_map);
>   
>   return cpu_mask_to_apicid(_mask);
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -888,7 +888,7 @@ static void enable_iommu(struct amd_iomm
>   
>   desc = irq_to_desc(iommu->msi.irq);
>   spin_lock(>lock);
> -set_msi_affinity(desc, _online_map);
> +set_msi_affinity(desc, NULL);
>   spin_unlock(>lock);
>   
>   amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2133,11 +2133,11 @@ static void adjust_irq_affinity(struct a
>   const struct acpi_rhsa_unit *rhsa = drhd_to_rhsa(drhd);
>   unsigned int node = rhsa ? pxm_to_node(rhsa->proximity_domain)
>: NUMA_NO_NODE;
> -const cpumask_t *cpumask = _online_map;
> +const cpumask_t *cpumask = NULL;
>   struct irq_desc *desc;
>   
>   if ( node < MAX_NUMNODES && node_online(node) &&
> - cpumask_intersects(_to_cpumask(node), cpumask) )
> + cpumask_intersects(_to_cpumask(node), _online_map) )
>   cpumask = _to_cpumask(node);
>   
>   desc = irq_to_desc(drhd->iommu->msi.irq);

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/2] x86/mm: Clean IOMMU flags from p2m-pt code

2019-07-24 Thread Woods, Brian

On Tue, Jul 16, 2019 at 12:01:11PM +, Alexandru Stefan ISAILA wrote:
> At this moment IOMMU pt sharing is disabled by commit [1].
> 
> This patch aims to clear the IOMMU hap share support as it will not be
> used in the future. By doing this the IOMMU bits used in pte[52:58] can
> be used in other ways.
> 
> [1] c2ba3db31ef2d9f1e40e7b6c16cf3be3d671d555
> 
> Suggested-by: George Dunlap 
> Signed-off-by: Alexandru Isaila 

Acked-by: Brian Woods 

> ---
> Changes since V1:
>   - Rework commit message
>   - Reflow comments
>   - Move flags init to declaration in p2m_type_to_flags.
> ---
>  xen/arch/x86/mm/p2m-pt.c | 96 +++-
>  1 file changed, 5 insertions(+), 91 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index cafc9f299b..3a0a500d66 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -24,7 +24,6 @@
>   * along with this program; If not, see .
>   */
>  
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -36,15 +35,13 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "mm-locks.h"
>  
>  /*
>   * We may store INVALID_MFN in PTEs.  We need to clip this to avoid trampling
> - * over higher-order bits (NX, p2m type, IOMMU flags).  We seem to not need
> - * to unclip on the read path, as callers are concerned only with p2m type in
> - * such cases.
> + * over higher-order bits (NX, p2m type). We seem to not need to unclip on 
> the
> + * read path, as callers are concerned only with p2m type in such cases.
>   */
>  #define p2m_l1e_from_pfn(pfn, flags)\
>  l1e_from_pfn((pfn) & (PADDR_MASK >> PAGE_SHIFT), (flags))
> @@ -71,13 +68,7 @@ static unsigned long p2m_type_to_flags(const struct 
> p2m_domain *p2m,
> mfn_t mfn,
> unsigned int level)
>  {
> -unsigned long flags;
> -/*
> - * AMD IOMMU: When we share p2m table with iommu, bit 9 - bit 11 will be
> - * used for iommu hardware to encode next io page level. Bit 59 - bit 62
> - * are used for iommu flags, We could not use these bits to store p2m 
> types.
> - */
> -flags = (unsigned long)(t & 0x7f) << 12;
> +unsigned long flags = (unsigned long)(t & 0x7f) << 12;
>  
>  switch(t)
>  {
> @@ -165,16 +156,6 @@ p2m_free_entry(struct p2m_domain *p2m, l1_pgentry_t 
> *p2m_entry, int page_order)
>  // Returns 0 on error.
>  //
>  
> -/* AMD IOMMU: Convert next level bits and r/w bits into 24 bits p2m flags */
> -#define iommu_nlevel_to_flags(nl, f) nl) & 0x7) << 9 )|(((f) & 0x3) << 
> 21))
> -
> -static void p2m_add_iommu_flags(l1_pgentry_t *p2m_entry,
> -unsigned int nlevel, unsigned int flags)
> -{
> -if ( iommu_hap_pt_share )
> -l1e_add_flags(*p2m_entry, iommu_nlevel_to_flags(nlevel, flags));
> -}
> -
>  /* Returns: 0 for success, -errno for failure */
>  static int
>  p2m_next_level(struct p2m_domain *p2m, void **table,
> @@ -203,7 +184,6 @@ p2m_next_level(struct p2m_domain *p2m, void **table,
>  
>  new_entry = l1e_from_mfn(mfn, P2M_BASE_FLAGS | _PAGE_RW);
>  
> -p2m_add_iommu_flags(_entry, level, 
> IOMMUF_readable|IOMMUF_writable);
>  rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, new_entry, level + 1);
>  if ( rc )
>  goto error;
> @@ -242,13 +222,6 @@ p2m_next_level(struct p2m_domain *p2m, void **table,
>  
>  l1_entry = map_domain_page(mfn);
>  
> -/* Inherit original IOMMU permissions, but update Next Level. */
> -if ( iommu_hap_pt_share )
> -{
> -flags &= ~iommu_nlevel_to_flags(~0, 0);
> -flags |= iommu_nlevel_to_flags(level - 1, 0);
> -}
> -
>  for ( i = 0; i < (1u << PAGETABLE_ORDER); i++ )
>  {
>  new_entry = l1e_from_pfn(pfn | (i << ((level - 1) * 
> PAGETABLE_ORDER)),
> @@ -264,8 +237,6 @@ p2m_next_level(struct p2m_domain *p2m, void **table,
>  unmap_domain_page(l1_entry);
>  
>  new_entry = l1e_from_mfn(mfn, P2M_BASE_FLAGS | _PAGE_RW);
> -p2m_add_iommu_flags(_entry, level,
> -IOMMUF_readable|IOMMUF_writable);
>  rc = p2m->write_p2m_entry(p2m, gfn, p2m_entry, new_entry,
>level + 1);
>  if ( rc )
> @@ -470,9 +441,6 @@ static int do_recalc(struct p2m_domain *p2m, unsigned 
> long gfn)
>  }
>  
>  e = l1e_from_pfn(mfn, flags);
> -p2m_add_iommu_flags(, level,
> -(nt == p2m_ram_rw)
> -? IOMMUF_readable|IOMMUF_writable : 0);
>  ASSERT(!needs_recalc(l1, e));
>  }
>  else
> @@ -540,18 +508,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, 
> mfn_t mfn,
>  l2_pgentry_t l2e_content;
>  l3_pgentry_t l3e_content;
>  int rc;
> -unsigned int

Re: [Xen-devel] [PATCH v3 2/2] passthrough/amd: Clean iommu_hap_pt_share enabled code

2019-07-24 Thread Woods, Brian

On Tue, Jul 16, 2019 at 12:01:15PM +, Alexandru Stefan ISAILA wrote:
> At this moment IOMMU pt sharing is disabled by commit [1].
> 
> This patch cleans the unreachable code garded by iommu_hap_pt_share.
> 
> [1] c2ba3db31ef2d9f1e40e7b6c16cf3be3d671d555
> 
> Signed-off-by: Alexandru Isaila 

Acked-by: Brian Woods 

> ---
>  xen/drivers/passthrough/amd/iommu_map.c   | 28 ---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |  4 ---
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  3 --
>  3 files changed, 35 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
> b/xen/drivers/passthrough/amd/iommu_map.c
> index cbf00e9e72..90cc7075c2 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -364,9 +364,6 @@ int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t 
> mfn,
>  int rc;
>  unsigned long pt_mfn[7];
>  
> -if ( iommu_use_hap_pt(d) )
> -return 0;
> -
>  memset(pt_mfn, 0, sizeof(pt_mfn));
>  
>  spin_lock(>arch.mapping_lock);
> @@ -420,9 +417,6 @@ int amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
>  unsigned long pt_mfn[7];
>  struct domain_iommu *hd = dom_iommu(d);
>  
> -if ( iommu_use_hap_pt(d) )
> -return 0;
> -
>  memset(pt_mfn, 0, sizeof(pt_mfn));
>  
>  spin_lock(>arch.mapping_lock);
> @@ -558,28 +552,6 @@ int amd_iommu_reserve_domain_unity_map(struct domain 
> *domain,
>  return rt;
>  }
>  
> -/* Share p2m table with iommu. */
> -void amd_iommu_share_p2m(struct domain *d)
> -{
> -struct domain_iommu *hd = dom_iommu(d);
> -struct page_info *p2m_table;
> -mfn_t pgd_mfn;
> -
> -pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> -p2m_table = mfn_to_page(pgd_mfn);
> -
> -if ( hd->arch.root_table != p2m_table )
> -{
> -free_amd_iommu_pgtable(hd->arch.root_table);
> -hd->arch.root_table = p2m_table;
> -
> -/* When sharing p2m with iommu, paging mode = 4 */
> -hd->arch.paging_mode = 4;
> -AMD_IOMMU_DEBUG("Share p2m table with iommu: p2m table = %#lx\n",
> -mfn_x(pgd_mfn));
> -}
> -}
> -
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 4afbcd1609..be076210b6 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -396,9 +396,6 @@ static void deallocate_iommu_page_tables(struct domain *d)
>  {
>  struct domain_iommu *hd = dom_iommu(d);
>  
> -if ( iommu_use_hap_pt(d) )
> -return;
> -
>  spin_lock(>arch.mapping_lock);
>  if ( hd->arch.root_table )
>  {
> @@ -566,7 +563,6 @@ static const struct iommu_ops __initconstrel _iommu_ops = 
> {
>  .setup_hpet_msi = amd_setup_hpet_msi,
>  .suspend = amd_iommu_suspend,
>  .resume = amd_iommu_resume,
> -.share_p2m = amd_iommu_share_p2m,
>  .crash_shutdown = amd_iommu_crash_shutdown,
>  .dump_p2m_table = amd_dump_p2m_table,
>  };
> diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h 
> b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> index e0d5d23978..b832f564a7 100644
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -66,9 +66,6 @@ int __must_check amd_iommu_flush_iotlb_pages(struct domain 
> *d, dfn_t dfn,
>   unsigned int flush_flags);
>  int __must_check amd_iommu_flush_iotlb_all(struct domain *d);
>  
> -/* Share p2m table with iommu */
> -void amd_iommu_share_p2m(struct domain *d);
> -
>  /* device table functions */
>  int get_dma_requestor_id(uint16_t seg, uint16_t bdf);
>  void amd_iommu_set_intremap_table(struct amd_iommu_dte *dte,
> -- 
> 2.17.1
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 14/14] AMD/IOMMU: process softirqs while dumping IRTs

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:41:21PM +, Jan Beulich wrote:
> When there are sufficiently many devices listed in the ACPI tables (no
> matter if they actually exist), output may take way longer than the
> watchdog would like.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v3: New.
> ---
> TBD: Seeing the volume of output I wonder whether we should further
>   suppress logging headers of devices which have no active entry
>   (i.e. emit the header only upon finding the first IRTE worth
>   logging). And while minor for the total volume of output I'm
>   also unconvinced logging both a "per device" header line and a
>   "shared" one makes sense, when only one of the two can actually
>   be followed by actual contents.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -22,6 +22,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   struct irte_basic {
>   bool remap_en:1;
> @@ -917,6 +918,8 @@ static int dump_intremap_mapping(const s
>   dump_intremap_table(iommu, ivrs_mapping->intremap_table);
>   spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
>   
> +process_pending_softirqs();
> +
>   return 0;
>   }
>   
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 12/14] AMD/IOMMU: enable x2APIC mode when available

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:40:33PM +, Jan Beulich wrote:
> In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
> switched into suitable state.
> 
> The post-AP-bringup IRQ affinity adjustment is done also for the non-
> x2APIC case, matching what VT-d does.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v3: Set GAEn (and other control register bits) earlier. Also clear the
>  bits enabled here in amd_iommu_init_cleanup(). Re-base. Pass NULL
>  CPU mask to set_{x2apic,msi}_affinity().
> v2: Drop cpu_has_cx16 check. Add comment.
> ---
> TBD: Instead of the system_state check in iov_enable_xt() the function
>   could also zap its own hook pointer, at which point it could also
>   become __init. This would, however, require that either
>   resume_x2apic() be bound to ignore iommu_enable_x2apic() errors
>   forever, or that iommu_enable_x2apic() be slightly re-arranged to
>   not return -EOPNOTSUPP when finding a NULL hook during resume.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -834,6 +834,30 @@ static bool_t __init set_iommu_interrupt
>   return 1;
>   }
>   
> +int iov_adjust_irq_affinities(void)
> +{
> +const struct amd_iommu *iommu;
> +
> +if ( !iommu_enabled )
> +return 0;
> +
> +for_each_amd_iommu ( iommu )
> +{
> +struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
> +unsigned long flags;
> +
> +spin_lock_irqsave(>lock, flags);
> +if ( iommu->ctrl.int_cap_xt_en )
> +set_x2apic_affinity(desc, NULL);
> +else
> +set_msi_affinity(desc, NULL);
> +spin_unlock_irqrestore(>lock, flags);
> +}
> +
> +return 0;
> +}
> +__initcall(iov_adjust_irq_affinities);
> +
>   /*
>* Family15h Model 10h-1fh erratum 746 (IOMMU Logging May Stall 
> Translations)
>* Workaround:
> @@ -1047,7 +1071,7 @@ static void * __init allocate_ppr_log(st
>   IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
>   }
>   
> -static int __init amd_iommu_init_one(struct amd_iommu *iommu)
> +static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
>   {
>   if ( allocate_cmd_buffer(iommu) == NULL )
>   goto error_out;
> @@ -1058,7 +1082,7 @@ static int __init amd_iommu_init_one(str
>   if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
>   goto error_out;
>   
> -if ( !set_iommu_interrupt_handler(iommu) )
> +if ( intr && !set_iommu_interrupt_handler(iommu) )
>   goto error_out;
>   
>   /* To make sure that device_table.buffer has been successfully 
> allocated */
> @@ -1087,8 +,16 @@ static void __init amd_iommu_init_cleanu
>   list_for_each_entry_safe ( iommu, next, _iommu_head, list )
>   {
>   list_del(>list);
> +
> +iommu->ctrl.ga_en = 0;
> +iommu->ctrl.xt_en = 0;
> +iommu->ctrl.int_cap_xt_en = 0;
> +
>   if ( iommu->enabled )
>   disable_iommu(iommu);
> +else if ( iommu->mmio_base )
> +writeq(iommu->ctrl.raw,
> +   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   
>   deallocate_ring_buffer(>cmd_buffer);
>   deallocate_ring_buffer(>event_log);
> @@ -1290,7 +1322,7 @@ static int __init amd_iommu_prepare_one(
>   return 0;
>   }
>   
> -int __init amd_iommu_init(void)
> +int __init amd_iommu_prepare(bool xt)
>   {
>   struct amd_iommu *iommu;
>   int rc = -ENODEV;
> @@ -1305,9 +1337,14 @@ int __init amd_iommu_init(void)
>   if ( unlikely(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI) )
>   goto error_out;
>   
> +/* Have we been here before? */
> +if ( ivhd_type )
> +return 0;
> +
>   rc = amd_iommu_get_supported_ivhd_type();
>   if ( rc < 0 )
>   goto error_out;
> +BUG_ON(!rc);
>   ivhd_type = rc;
>   
>   rc = amd_iommu_get_ivrs_dev_entries();
> @@ -1323,9 +1360,37 @@ int __init amd_iommu_init(void)
>   rc = amd_iommu_prepare_one(iommu);
>   if ( rc )
>   goto error_out;
> +
> +rc = -ENODEV;
> +if ( xt && (!iommu->features.flds.ga_sup || 
> !iommu->features.flds.xt_sup) )
> +goto error_out;
> +}
> +
> +for_each_amd_iommu ( iommu )
> +{
> +/* NB: There's no need to actually write these out right here. */
> +iommu->ctrl.ga_en |= xt;
> +iommu->ctrl.xt_en = xt;
> +iommu->ctrl.int_cap_xt_en = xt;
>   }
>   
>   rc = amd_iommu_update_ivrs_mapping_acpi();
> +
> + error_out:
> +if ( rc )
> +{
> +amd_iommu_init_cleanup();
> +ivhd_type = 0;
> +}
> +
> +return rc;
> +}
> +
> +int __init amd_iommu_init(bool xt)
> +{
> +struct amd_iommu *iommu;
> +int rc = amd_iommu_prepare(xt);
> +
>   if ( rc )
>   goto error_out;
>   
> @@ -1351,7 +1416,12 @@ int __init

Re: [Xen-devel] [PATCH v3 11/14] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:39:58PM +, Jan Beulich wrote:
> In order to be able to express all possible destinations we need to make
> use of this non-MSI-capability based mechanism. The new IRQ controller
> structure can re-use certain MSI functions, though.
> 
> For now general and PPR interrupts still share a single vector, IRQ, and
> hence handler.
> 
> Signed-off-by: Jan Beulich 
> Reviewed-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> v3: Re-base.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -472,6 +472,44 @@ static hw_irq_controller iommu_maskable_
>   .set_affinity = set_msi_affinity,
>   };
>   
> +static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
> +{
> +struct amd_iommu *iommu = desc->action->dev_id;
> +unsigned int dest = set_desc_affinity(desc, mask);
> +union amd_iommu_x2apic_control ctrl = {};
> +unsigned long flags;
> +
> +if ( dest == BAD_APICID )
> +return;
> +
> +msi_compose_msg(desc->arch.vector, NULL, >msi.msg);
> +iommu->msi.msg.dest32 = dest;
> +
> +ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
> +   MSI_ADDR_DESTMODE_MASK);
> +ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
> +  MSI_DATA_DELIVERY_MODE_MASK);
> +ctrl.vector = desc->arch.vector;
> +ctrl.dest_lo = dest;
> +ctrl.dest_hi = dest >> 24;
> +
> +spin_lock_irqsave(>lock, flags);
> +writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_INT_CTRL_MMIO_OFFSET);
> +writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET);
> +spin_unlock_irqrestore(>lock, flags);
> +}
> +
> +static hw_irq_controller iommu_x2apic_type = {
> +.typename = "IOMMU-x2APIC",
> +.startup  = irq_startup_none,
> +.shutdown = irq_shutdown_none,
> +.enable   = irq_enable_none,
> +.disable  = irq_disable_none,
> +.ack  = ack_nonmaskable_msi_irq,
> +.end  = end_nonmaskable_msi_irq,
> +.set_affinity = set_x2apic_affinity,
> +};
> +
>   static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>   {
>   u16 domain_id, device_id, flags;
> @@ -726,8 +764,6 @@ static void iommu_interrupt_handler(int
>   static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
>   {
>   int irq, ret;
> -hw_irq_controller *handler;
> -u16 control;
>   
>   irq = create_irq(NUMA_NO_NODE);
>   if ( irq <= 0 )
> @@ -747,20 +783,43 @@ static bool_t __init set_iommu_interrupt
>   PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
>   return 0;
>   }
> -control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
> -  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
> -  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
> -iommu->msi.msi.nvec = 1;
> -if ( is_mask_bit_support(control) )
> -{
> -iommu->msi.msi_attrib.maskbit = 1;
> -iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
> -is_64bit_address(control));
> -handler = _maskable_msi_type;
> +
> +if ( iommu->ctrl.int_cap_xt_en )
> +{
> +struct irq_desc *desc = irq_to_desc(irq);
> +
> +iommu->msi.msi_attrib.pos = MSI_TYPE_IOMMU;
> +iommu->msi.msi_attrib.maskbit = 0;
> +iommu->msi.msi_attrib.is_64 = 1;
> +
> +desc->msi_desc = >msi;
> +desc->handler = _x2apic_type;
> +
> +ret = 0;
>   }
>   else
> -handler = _msi_type;
> -ret = __setup_msi_irq(irq_to_desc(irq), >msi, handler);
> +{
> +hw_irq_controller *handler;
> +u16 control;
> +
> +control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
> +  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
> +  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
> +
> +iommu->msi.msi.nvec = 1;
> +if ( is_mask_bit_support(control) )
> +{
> +iommu->msi.msi_attrib.maskbit = 1;
> +iommu->msi.msi.mpos = 
> msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
> +
> is_64bit_address(control));
> +handler = _maskable_msi_type;
> +}
> +else
> +handler = _msi_type;
> +
> +ret = __setup_msi_irq(irq_to_desc(irq), >msi, handler);
> +}
> +
>   if ( !ret )
>   ret = request_irq(irq, 0, iommu_interrupt_handler, "amd_iommu", 
> iommu);
>   if ( ret )
> @@ -838,8 +897,19 @@ static void enable_iommu(struct amd_iomm
>   struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
>   
>   spin_lock(>lock);
> -set_msi_affinity(desc, NULL);
> -spin_unlock(>lock);
> +
> +if ( iommu->ctrl.int_cap_xt_en )
> +{
> +

Re: [Xen-devel] [PATCH v3 10/14] AMD/IOMMU: allow enabling with IRQ not yet set up

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:39:34PM +, Jan Beulich wrote:
> Early enabling (to enter x2APIC mode) requires deferring of the IRQ
> setup. Code to actually do that setup in the x2APIC case will get added
> subsequently.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> v3: Re-base.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -814,7 +814,6 @@ static void amd_iommu_erratum_746_workar
>   static void enable_iommu(struct amd_iommu *iommu)
>   {
>   unsigned long flags;
> -struct irq_desc *desc;
>   
>   spin_lock_irqsave(>lock, flags);
>   
> @@ -834,19 +833,27 @@ static void enable_iommu(struct amd_iomm
>   if ( iommu->features.flds.ppr_sup )
>   register_iommu_ppr_log_in_mmio_space(iommu);
>   
> -desc = irq_to_desc(iommu->msi.irq);
> -spin_lock(>lock);
> -set_msi_affinity(desc, NULL);
> -spin_unlock(>lock);
> +if ( iommu->msi.irq > 0 )
> +{
> +struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
> +
> +spin_lock(>lock);
> +set_msi_affinity(desc, NULL);
> +spin_unlock(>lock);
> +}
>   
>   amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
>   
>   set_iommu_ht_flags(iommu);
>   set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
> -set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
>   
> -if ( iommu->features.flds.ppr_sup )
> -set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +if ( iommu->msi.irq > 0 )
> +{
> +set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +
> +if ( iommu->features.flds.ppr_sup )
> +set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +}
>   
>   if ( iommu->features.flds.gt_sup )
>   set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 09/14] AMD/IOMMU: split amd_iommu_init_one()

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:39:10PM +, Jan Beulich wrote:
> Mapping the MMIO space and obtaining feature information needs to happen
> slightly earlier, such that for x2APIC support we can set XTEn prior to
> calling amd_iommu_update_ivrs_mapping_acpi() and
> amd_iommu_setup_ioapic_remapping().
> 
> Signed-off-by: Jan Beulich 
> Reviewed-by: Andrew Cooper 

Acked-by: Brian Woods 

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -970,14 +970,6 @@ static void * __init allocate_ppr_log(st
>   
>   static int __init amd_iommu_init_one(struct amd_iommu *iommu)
>   {
> -if ( map_iommu_mmio_region(iommu) != 0 )
> -goto error_out;
> -
> -get_iommu_features(iommu);
> -
> -if ( iommu->features.raw )
> -iommuv2_enabled = 1;
> -
>   if ( allocate_cmd_buffer(iommu) == NULL )
>   goto error_out;
>   
> @@ -1202,6 +1194,23 @@ static bool_t __init amd_sp5100_erratum2
>   return 0;
>   }
>   
> +static int __init amd_iommu_prepare_one(struct amd_iommu *iommu)
> +{
> +int rc = alloc_ivrs_mappings(iommu->seg);
> +
> +if ( !rc )
> +rc = map_iommu_mmio_region(iommu);
> +if ( rc )
> +return rc;
> +
> +get_iommu_features(iommu);
> +
> +if ( iommu->features.raw )
> +iommuv2_enabled = true;
> +
> +return 0;
> +}
> +
>   int __init amd_iommu_init(void)
>   {
>   struct amd_iommu *iommu;
> @@ -1232,7 +1241,7 @@ int __init amd_iommu_init(void)
>   radix_tree_init(_maps);
>   for_each_amd_iommu ( iommu )
>   {
> -rc = alloc_ivrs_mappings(iommu->seg);
> +rc = amd_iommu_prepare_one(iommu);
>   if ( rc )
>   goto error_out;
>   }
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 07/14] AMD/IOMMU: pass IOMMU to {get, free, update}_intremap_entry()

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:37:51PM +, Jan Beulich wrote:
> The functions will want to know IOMMU properties (specifically the IRTE
> size) subsequently.
> 
> Rather than introducing a second error path bogusly returning -E... from
> amd_iommu_read_ioapic_from_ire(), also change the existing one to follow
> VT-d in returning the raw (untranslated) IO-APIC RTE.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v3: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -123,11 +123,11 @@ static unsigned int alloc_intremap_entry
>   return slot;
>   }
>   
> -static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
> - unsigned int index)
> +static union irte_ptr get_intremap_entry(const struct amd_iommu *iommu,
> + unsigned int bdf, unsigned int 
> index)
>   {
>   union irte_ptr table = {
> -.ptr = get_ivrs_mappings(seg)[bdf].intremap_table
> +.ptr = get_ivrs_mappings(iommu->seg)[bdf].intremap_table
>   };
>   
>   ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
> @@ -137,18 +137,19 @@ static union irte_ptr get_intremap_entry
>   return table;
>   }
>   
> -static void free_intremap_entry(unsigned int seg, unsigned int bdf,
> -unsigned int index)
> +static void free_intremap_entry(const struct amd_iommu *iommu,
> +unsigned int bdf, unsigned int index)
>   {
> -union irte_ptr entry = get_intremap_entry(seg, bdf, index);
> +union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>   
>   ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>   
> -__clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
> +__clear_bit(index, get_ivrs_mappings(iommu->seg)[bdf].intremap_inuse);
>   }
>   
> -static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
> -  unsigned int int_type,
> +static void update_intremap_entry(const struct amd_iommu *iommu,
> +  union irte_ptr entry,
> +  unsigned int vector, unsigned int int_type,
> unsigned int dest_mode, unsigned int dest)
>   {
>   struct irte_basic basic = {
> @@ -212,7 +213,7 @@ static int update_intremap_entry_from_io
>   lo_update = 1;
>   }
>   
> -entry = get_intremap_entry(iommu->seg, req_id, offset);
> +entry = get_intremap_entry(iommu, req_id, offset);
>   if ( !lo_update )
>   {
>   /*
> @@ -223,7 +224,7 @@ static int update_intremap_entry_from_io
>   vector = entry.ptr32->basic.vector;
>   delivery_mode = entry.ptr32->basic.int_type;
>   }
> -update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
> +update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, 
> dest);
>   
>   spin_unlock_irqrestore(lock, flags);
>   
> @@ -288,8 +289,8 @@ int __init amd_iommu_setup_ioapic_remapp
>   spin_lock_irqsave(lock, flags);
>   offset = alloc_intremap_entry(seg, req_id, 1);
>   BUG_ON(offset >= INTREMAP_ENTRIES);
> -entry = get_intremap_entry(iommu->seg, req_id, offset);
> -update_intremap_entry(entry, vector,
> +entry = get_intremap_entry(iommu, req_id, offset);
> +update_intremap_entry(iommu, entry, vector,
> delivery_mode, dest_mode, dest);
>   spin_unlock_irqrestore(lock, flags);
>   
> @@ -413,7 +414,7 @@ unsigned int amd_iommu_read_ioapic_from_
>   
>   idx = ioapic_id_to_index(IO_APIC_ID(apic));
>   if ( idx == MAX_IO_APICS )
> -return -EINVAL;
> +return val;
>   
>   offset = ioapic_sbdf[idx].pin_2_idx[pin];
>   
> @@ -422,9 +423,13 @@ unsigned int amd_iommu_read_ioapic_from_
>   u16 bdf = ioapic_sbdf[idx].bdf;
>   u16 seg = ioapic_sbdf[idx].seg;
>   u16 req_id = get_intremap_requestor_id(seg, bdf);
> -union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
> +const struct amd_iommu *iommu = find_iommu_for_device(seg, bdf);
> +union irte_ptr entry;
>   
> +if ( !iommu )
> +return val;
>   ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
> +entry = get_intremap_entry(iommu, req_id, offset);
>   val &= ~(INTREMAP_ENTRIES - 1);
>   val |= MASK_INSR(entry.ptr32->basic.int_type,
>IO_APIC_REDIR_DELIV_MODE_MASK);
> @@ -454,7 +459,7 @@ static int update_intremap_entry_from_ms
>   lock = get_intremap_lock(iommu->seg, req_id);
>   spin_lock_irqsave(lock, flags);
>   for ( i = 0; i < nr; ++i )
> -free_intremap_entry(iommu->seg, req_id, *remap_index + i);
> +free_intremap_entry(iommu, req_id,

Re: [Xen-devel] [PATCH v3 06/14] AMD/IOMMU: pass IOMMU to amd_iommu_alloc_intremap_table()

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:37:26PM +, Jan Beulich wrote:
> The function will want to know IOMMU properties (specifically the IRTE
> size) subsequently.
> 
> Correct indentation of one of the call sites at this occasion.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v3: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -74,12 +74,14 @@ static void __init add_ivrs_mapping_entr
>/* allocate per-device interrupt remapping table */
>if ( amd_iommu_perdev_intremap )
>ivrs_mappings[alias_id].intremap_table =
> -amd_iommu_alloc_intremap_table(
> -_mappings[alias_id].intremap_inuse);
> + amd_iommu_alloc_intremap_table(
> + iommu,
> + _mappings[alias_id].intremap_inuse);
>else
>{
>if ( shared_intremap_table == NULL  )
>shared_intremap_table = amd_iommu_alloc_intremap_table(
> + iommu,
>_intremap_inuse);
>ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
>ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -632,7 +632,8 @@ int __init amd_iommu_free_intremap_table
>   return 0;
>   }
>   
> -void* __init amd_iommu_alloc_intremap_table(unsigned long **inuse_map)
> +void *__init amd_iommu_alloc_intremap_table(
> +const struct amd_iommu *iommu, unsigned long **inuse_map)
>   {
>   void *tb;
>   tb = __alloc_amd_iommu_tables(INTREMAP_TABLE_ORDER);
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -97,7 +97,8 @@ struct amd_iommu *find_iommu_for_device(
>   
>   /* interrupt remapping */
>   int amd_iommu_setup_ioapic_remapping(void);
> -void *amd_iommu_alloc_intremap_table(unsigned long **);
> +void *amd_iommu_alloc_intremap_table(
> +const struct amd_iommu *, unsigned long **);
>   int amd_iommu_free_intremap_table(
>   const struct amd_iommu *, struct ivrs_mappings *);
>   void amd_iommu_ioapic_update_ire(
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 05/14] AMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:37:04PM +, Jan Beulich wrote:
> Both users will want to know IOMMU properties (specifically the IRTE
> size) subsequently. Leverage this to avoid pointless calls to the
> callback when IVRS mapping table entries are unpopulated. To avoid
> leaking interrupt remapping tables (bogusly) allocated for IOMMUs
> themselves, this requires suppressing their allocation in the first
> place, taking a step further what commit 757122c0cf ('AMD/IOMMU: don't
> "add" IOMMUs') had done.
> 
> Additionally suppress the call for alias entries, as again both users
> don't care about these anyway. In fact this eliminates a fair bit of
> redundancy from dump output.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v3: New.
> ---
> TBD: Along the lines of avoiding the IRT allocation for the IOMMUs, is
>   there a way to recognize the many CPU-provided devices many of
>   which can't generate interrupts anyway, and avoid allocations for
>   them as well? It's 32k per device, after all. Another option might
>   be on-demand allocation of the tables, but quite possibly we'd get
>   into trouble with error handling there.
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -65,7 +65,11 @@ static void __init add_ivrs_mapping_entr
>   /* override flags for range of devices */
>   ivrs_mappings[bdf].device_flags = flags;
>   
> -if (ivrs_mappings[alias_id].intremap_table == NULL )
> +/* Don't map an IOMMU by itself. */
> +if ( iommu->bdf == bdf )
> +return;
> +
> +if ( !ivrs_mappings[alias_id].intremap_table )
>   {
>/* allocate per-device interrupt remapping table */
>if ( amd_iommu_perdev_intremap )
> @@ -81,8 +85,9 @@ static void __init add_ivrs_mapping_entr
>ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
>}
>   }
> -/* Assign IOMMU hardware, but don't map an IOMMU by itself. */
> -ivrs_mappings[bdf].iommu = iommu->bdf != bdf ? iommu : NULL;
> +
> +/* Assign IOMMU hardware. */
> +ivrs_mappings[bdf].iommu = iommu;
>   }
>   
>   static struct amd_iommu * __init find_iommu_from_bdf_cap(
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1069,7 +1069,8 @@ int iterate_ivrs_mappings(int (*handler)
>   return rc;
>   }
>   
> -int iterate_ivrs_entries(int (*handler)(u16 seg, struct ivrs_mappings *))
> +int iterate_ivrs_entries(int (*handler)(const struct amd_iommu *,
> +struct ivrs_mappings *))
>   {
>   u16 seg = 0;
>   int rc = 0;
> @@ -1082,7 +1083,12 @@ int iterate_ivrs_entries(int (*handler)(
>   break;
>   seg = IVRS_MAPPINGS_SEG(map);
>   for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; ++bdf )
> -rc = handler(seg, map + bdf);
> +{
> +const struct amd_iommu *iommu = map[bdf].iommu;
> +
> +if ( iommu && map[bdf].dte_requestor_id == bdf )
> +rc = handler(iommu, [bdf]);
> +}
>   } while ( !rc && ++seg );
>   
>   return rc;
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -617,7 +617,7 @@ void amd_iommu_read_msi_from_ire(
>   }
>   
>   int __init amd_iommu_free_intremap_table(
> -u16 seg, struct ivrs_mappings *ivrs_mapping)
> +const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
>   {
>   void *tb = ivrs_mapping->intremap_table;
>   
> @@ -693,14 +693,15 @@ static void dump_intremap_table(const u3
>   }
>   }
>   
> -static int dump_intremap_mapping(u16 seg, struct ivrs_mappings *ivrs_mapping)
> +static int dump_intremap_mapping(const struct amd_iommu *iommu,
> + struct ivrs_mappings *ivrs_mapping)
>   {
>   unsigned long flags;
>   
>   if ( !ivrs_mapping )
>   return 0;
>   
> -printk("  %04x:%02x:%02x:%u:\n", seg,
> +printk("  %04x:%02x:%02x:%u:\n", iommu->seg,
>  PCI_BUS(ivrs_mapping->dte_requestor_id),
>  PCI_SLOT(ivrs_mapping->dte_requestor_id),
>  PCI_FUNC(ivrs_mapping->dte_requestor_id));
> --- a/xen/include/asm-x86/amd-iommu.h
> +++ b/xen/include/asm-x86/amd-iommu.h
> @@ -129,7 +129,8 @@ extern u8 ivhd_type;
>   
>   struct ivrs_mappings *get_ivrs_mappings(u16 seg);
>   int iterate_ivrs_mappings(int (*)(u16 seg, struct ivrs_mappings *));
> -int iterate_ivrs_entries(int (*)(u16 seg, struct ivrs_mappings *));
> +int iterate_ivrs_entries(int (*)(const struct amd_iommu *,
> + struct ivrs_mappings *));
>   
>   /* iommu tables in guest space */
>   struct mmio_reg {
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -98,7 +98,8 @@ struct amd_iommu *find_iommu_for_device(
>   /* interrupt remapping */
>   int

Re: [Xen-devel] [PATCH v3 04/14] AMD/IOMMU: use bit field for IRTE

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:36:34PM +, Jan Beulich wrote:
> At the same time restrict its scope to just the single source file
> actually using it, and abstract accesses by introducing a union of
> pointers. (A union of the actual table entries is not used to make it
> impossible to [wrongly, once the 128-bit form gets added] perform
> pointer arithmetic / array accesses on derived types.)
> 
> Also move away from updating the entries piecemeal: Construct a full new
> entry, and write it out.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v3: Switch boolean bitfields to bool.
> v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
>  "offset". Introduce union irte32.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -23,6 +23,28 @@
>   #include 
>   #include 
>   
> +struct irte_basic {
> +bool remap_en:1;
> +bool sup_io_pf:1;
> +unsigned int int_type:3;
> +bool rq_eoi:1;
> +bool dm:1;
> +bool guest_mode:1; /* MBZ */
> +unsigned int dest:8;
> +unsigned int vector:8;
> +unsigned int :8;
> +};
> +
> +union irte32 {
> +uint32_t raw[1];
> +struct irte_basic basic;
> +};
> +
> +union irte_ptr {
> +void *ptr;
> +union irte32 *ptr32;
> +};
> +
>   #define INTREMAP_TABLE_ORDER1
>   #define INTREMAP_LENGTH 0xB
>   #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
> @@ -101,47 +123,44 @@ static unsigned int alloc_intremap_entry
>   return slot;
>   }
>   
> -static u32 *get_intremap_entry(int seg, int bdf, int offset)
> +static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
> + unsigned int index)
>   {
> -u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
> +union irte_ptr table = {
> +.ptr = get_ivrs_mappings(seg)[bdf].intremap_table
> +};
> +
> +ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
>   
> -ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
> +table.ptr32 += index;
>   
> -return table + offset;
> +return table;
>   }
>   
> -static void free_intremap_entry(int seg, int bdf, int offset)
> -{
> -u32 *entry = get_intremap_entry(seg, bdf, offset);
> -
> -memset(entry, 0, sizeof(u32));
> -__clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
> -}
> -
> -static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
> -u8 dest_mode, u8 dest)
> -{
> -set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
> -INT_REMAP_ENTRY_REMAPEN_MASK,
> -INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
> -set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
> -INT_REMAP_ENTRY_SUPIOPF_MASK,
> -INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
> -set_field_in_reg_u32(int_type, *entry,
> -INT_REMAP_ENTRY_INTTYPE_MASK,
> -INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
> -set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
> -INT_REMAP_ENTRY_REQEOI_MASK,
> -INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
> -set_field_in_reg_u32((u32)dest_mode, *entry,
> -INT_REMAP_ENTRY_DM_MASK,
> -INT_REMAP_ENTRY_DM_SHIFT, entry);
> -set_field_in_reg_u32((u32)dest, *entry,
> -INT_REMAP_ENTRY_DEST_MAST,
> -INT_REMAP_ENTRY_DEST_SHIFT, entry);
> -set_field_in_reg_u32((u32)vector, *entry,
> -INT_REMAP_ENTRY_VECTOR_MASK,
> -INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
> +static void free_intremap_entry(unsigned int seg, unsigned int bdf,
> +unsigned int index)
> +{
> +union irte_ptr entry = get_intremap_entry(seg, bdf, index);
> +
> +ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
> +
> +__clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
> +}
> +
> +static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
> +  unsigned int int_type,
> +  unsigned int dest_mode, unsigned int dest)
> +{
> +struct irte_basic basic = {
> +.remap_en = true,
> +.int_type = int_type,
> +.dm = dest_mode,
> +.dest = dest,
> +.vector = vector,
> +};
> +
> +ACCESS_ONCE(entry.ptr32->raw[0]) =
> +container_of(, union irte32, basic)->raw[0];
>   }
>   
>   static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
> @@ -163,7 +182,7 @@ static int update_intremap_entry_from_io
>   u16 *index)
>   {
>   unsigned long flags;
> -u32* entry;
> +union irte_ptr entry;
>   u8 delivery_mode, dest, vector, dest_mode;
>   int req_id;
>   spinlock_t *lock;
> @@ -201,12 +220,8 @@ static int

Re: [Xen-devel] [PATCH v3 03/14] AMD/IOMMU: use bit field for control register

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:36:06PM +, Jan Beulich wrote:
> Also introduce a field in struct amd_iommu caching the most recently
> written control register. All writes should now happen exclusively from
> that cached value, such that it is guaranteed to be up to date.
> 
> Take the opportunity and add further fields. Also convert a few boolean
> function parameters to bool, such that use of !! can be avoided.
> 
> Because of there now being definitions beyond bit 31, writel() also gets
> replaced by writeq() when updating hardware.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> v3: Switch boolean bitfields to bool.
> v2: Add domain_id_pne field. Mention writel() -> writeq() change.
> 
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -317,7 +317,7 @@ static int do_invalidate_iotlb_pages(str
>   
>   static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
>   {
> -bool_t com_wait_int_en, com_wait_int, i, s;
> +bool com_wait_int, i, s;
>   struct guest_iommu *iommu;
>   unsigned long gfn;
>   p2m_type_t p2mt;
> @@ -354,12 +354,10 @@ static int do_completion_wait(struct dom
>   unmap_domain_page(vaddr);
>   }
>   
> -com_wait_int_en = iommu_get_bit(iommu->reg_ctrl.lo,
> -IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
>   com_wait_int = iommu_get_bit(iommu->reg_status.lo,
>IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
>   
> -if ( com_wait_int_en && com_wait_int )
> +if ( iommu->reg_ctrl.com_wait_int_en && com_wait_int )
>   guest_iommu_deliver_msi(d);
>   
>   return 0;
> @@ -521,40 +519,17 @@ static void guest_iommu_process_command(
>   return;
>   }
>   
> -static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t 
> newctrl)
> +static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t val)
>   {
> -bool_t cmd_en, event_en, iommu_en, ppr_en, ppr_log_en;
> -bool_t cmd_en_old, event_en_old, iommu_en_old;
> -bool_t cmd_run;
> -
> -iommu_en = iommu_get_bit(newctrl,
> - IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
> -iommu_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
> - IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
> -
> -cmd_en = iommu_get_bit(newctrl,
> -   IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
> -cmd_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
> -   IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
> -cmd_run = iommu_get_bit(iommu->reg_status.lo,
> -IOMMU_STATUS_CMD_BUFFER_RUN_SHIFT);
> -event_en = iommu_get_bit(newctrl,
> - IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
> -event_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
> - IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
> -
> -ppr_en = iommu_get_bit(newctrl,
> -   IOMMU_CONTROL_PPR_ENABLE_SHIFT);
> -ppr_log_en = iommu_get_bit(newctrl,
> -   IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
> +union amd_iommu_control newctrl = { .raw = val };
>   
> -if ( iommu_en )
> +if ( newctrl.iommu_en )
>   {
>   guest_iommu_enable(iommu);
>   guest_iommu_enable_dev_table(iommu);
>   }
>   
> -if ( iommu_en && cmd_en )
> +if ( newctrl.iommu_en && newctrl.cmd_buf_en )
>   {
>   guest_iommu_enable_ring_buffer(iommu, >cmd_buffer,
>  sizeof(cmd_entry_t));
> @@ -562,7 +537,7 @@ static int guest_iommu_write_ctrl(struct
>   tasklet_schedule(>cmd_buffer_tasklet);
>   }
>   
> -if ( iommu_en && event_en )
> +if ( newctrl.iommu_en && newctrl.event_log_en )
>   {
>   guest_iommu_enable_ring_buffer(iommu, >event_log,
>  sizeof(event_entry_t));
> @@ -570,7 +545,7 @@ static int guest_iommu_write_ctrl(struct
>   guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_OVERFLOW_SHIFT);
>   }
>   
> -if ( iommu_en && ppr_en && ppr_log_en )
> +if ( newctrl.iommu_en && newctrl.ppr_en && newctrl.ppr_log_en )
>   {
>   guest_iommu_enable_ring_buffer(iommu, >ppr_log,
>  sizeof(ppr_entry_t));
> @@ -578,19 +553,21 @@ static int guest_iommu_write_ctrl(struct
>   guest_iommu_clear_status(iommu, 
> IOMMU_STATUS_PPR_LOG_OVERFLOW_SHIFT);
>   }
>   
> -if ( iommu_en && cmd_en_old && !cmd_en )
> +if ( newctrl.iommu_en && iommu->reg_ctrl.cmd_buf_en &&
> + !newctrl.cmd_buf_en )
>   {
>   /* Disable iommu command processing */
>   tasklet_kill(>cmd_buffer_tasklet);
>   }
>   
> -if ( event_en_old && !event_en )
> +if ( iommu->reg_ctrl.event_log_en && !newctrl.event_log_en )
>

Re: [Xen-devel] [PATCH v3 01/14] AMD/IOMMU: free more memory when cleaning up after error

2019-07-19 Thread Woods, Brian

On Tue, Jul 16, 2019 at 04:35:08PM +, Jan Beulich wrote:
> The interrupt remapping in-use bitmaps were leaked in all cases. The
> ring buffers and the mapping of the MMIO space were leaked for any IOMMU
> that hadn't been enabled yet.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v3: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1070,13 +1070,12 @@ static void __init amd_iommu_init_cleanu
>   {
>   list_del(>list);
>   if ( iommu->enabled )
> -{
>   disable_iommu(iommu);
> -deallocate_ring_buffer(>cmd_buffer);
> -deallocate_ring_buffer(>event_log);
> -deallocate_ring_buffer(>ppr_log);
> -unmap_iommu_mmio_region(iommu);
> -}
> +
> +deallocate_ring_buffer(>cmd_buffer);
> +deallocate_ring_buffer(>event_log);
> +deallocate_ring_buffer(>ppr_log);
> +unmap_iommu_mmio_region(iommu);
>   xfree(iommu);
>   }
>   
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -610,6 +610,8 @@ int __init amd_iommu_free_intremap_table
>   {
>   void *tb = ivrs_mapping->intremap_table;
>   
> +XFREE(ivrs_mapping->intremap_inuse);
> +
>   if ( tb )
>   {
>   __free_amd_iommu_tables(tb, INTREMAP_TABLE_ORDER);
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] iommu / x86: move call to scan_pci_devices() out of vendor code

2019-07-16 Thread Woods, Brian

On July 15, 2019 7:37:17 AM Paul Durrant  wrote:

> It's not vendor specific so it doesn't really belong there.
>
>
> Scanning the PCI topology also really doesn't have much to do with IOMMU
> initialization. It doesn't depend on there even being an IOMMU. This patch
> moves to the call to the beginning of iommu_hardware_setup() but only
> places it there because the topology information would be otherwise unused.
>
>
> Subsequent patches will actually make use of the PCI topology during
> (x86) IOMMU initialization.
>
>
> Signed-off-by: Paul Durrant 

Acked-by: Brian Woods 

> ---
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Kevin Tian 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: "Roger Pau Monné" 
>
>
> v2:
> - Expanded commit comment.
> - Moved PCI scan to before IOMMU initialization, rather than after it.
> ---
> xen/drivers/passthrough/amd/pci_amd_iommu.c | 3 ++-
> xen/drivers/passthrough/vtd/iommu.c | 4 
> xen/drivers/passthrough/x86/iommu.c | 6 ++
> 3 files changed, 8 insertions(+), 5 deletions(-)
>
>
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 4afbcd1609..3338a8e0e8 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -180,7 +180,8 @@ static int __init iov_detect(void)
>
> if ( !amd_iommu_perdev_intremap )
> printk(XENLOG_WARNING "AMD-Vi: Using global interrupt remap table is 
> not recommended (see XSA-36)!\n");
> -return scan_pci_devices();
> +
> +return 0;
> }
>
> int amd_iommu_alloc_root(struct domain_iommu *hd)
> diff --git a/xen/drivers/passthrough/vtd/iommu.c 
> b/xen/drivers/passthrough/vtd/iommu.c
> index 8b27d7e775..b0e3bf26b5 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2372,10 +2372,6 @@ static int __init vtd_setup(void)
> P(iommu_hap_pt_share, "Shared EPT tables");
> #undef P
>
> -ret = scan_pci_devices();
> -if ( ret )
> -goto error;
> -
> ret = init_vtd_hw();
> if ( ret )
> goto error;
> diff --git a/xen/drivers/passthrough/x86/iommu.c 
> b/xen/drivers/passthrough/x86/iommu.c
> index 0fa6dcc3fd..a7438c9c25 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -28,9 +28,15 @@ struct iommu_ops __read_mostly iommu_ops;
>
> int __init iommu_hardware_setup(void)
> {
> +int rc;
> +
> if ( !iommu_init_ops )
> return -ENODEV;
>
> +rc = scan_pci_devices();
> +if ( rc )
> +return rc;
> +
> if ( !iommu_ops.init )
> iommu_ops = *iommu_init_ops->ops;
> else
> --
> 2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging

2019-07-01 Thread Woods, Brian

On Thu, Jun 27, 2019 at 09:19:06AM -0600, Jan Beulich wrote:
> The common case is all IOMMUs having the same features. Log them only
> for the first IOMMU, or for any that have a differing feature set.
> 
> Requested-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> v2: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -62,6 +62,7 @@ void __init get_iommu_features(struct am
>  {
>  u32 low, high;
>  int i = 0 ;
> +const struct amd_iommu *first;
>  static const char *__initdata feature_str[] = {
>  "- Prefetch Pages Command", 
>  "- Peripheral Page Service Request", 
> @@ -89,6 +90,11 @@ void __init get_iommu_features(struct am
>  
>  iommu->features = ((u64)high << 32) | low;
>  
> +/* Don't log the same set of features over and over. */
> +first = list_first_entry(_iommu_head, struct amd_iommu, list);
> +if ( iommu != first && iommu->features == first->features )
> +return;
> +
>  printk("AMD-Vi: IOMMU Extended Features:\n");
>  
>  while ( feature_str[i] )
> 
> 
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/svm: Drop svm_vm{load,save}() helpers

2019-06-28 Thread Woods, Brian

On Thu, Jun 20, 2019 at 01:06:21PM +0100, Andy Cooper wrote:
> Following on from c/s 7d161f6537 "x86/svm: Fix svm_vmcb_dump() when used in
> current context", there is now only a single user of svm_vmsave() remaining in
> the tree, with all users moved to svm_vm{load,save}_pa().
> 
> nv->nv_n1vmcx has a matching nv->nv_n1vmcx_pa which is always correct, and
> avoids a redundant __pa() translation behind the scenes.
> 
> With this gone, all VM{LOAD,SAVE} operations are using paddr_t's which is more
> efficient, so drop the svm_vm{load,save}() helpers to avoid uses of them
> reappearing in the future.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> 
> It turns out I was mistaken about how complicated this was.
> ---
>  xen/arch/x86/hvm/svm/nestedsvm.c  | 2 +-
>  xen/include/asm-x86/hvm/svm/svm.h | 3 ---
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c 
> b/xen/arch/x86/hvm/svm/nestedsvm.c
> index 35c1a04..fef124f 100644
> --- a/xen/arch/x86/hvm/svm/nestedsvm.c
> +++ b/xen/arch/x86/hvm/svm/nestedsvm.c
> @@ -1030,7 +1030,7 @@ nsvm_vmcb_prepare4vmexit(struct vcpu *v, struct 
> cpu_user_regs *regs)
>  struct vmcb_struct *ns_vmcb = nv->nv_vvmcx;
>  struct vmcb_struct *n2vmcb = nv->nv_n2vmcx;
>  
> -svm_vmsave(nv->nv_n1vmcx);
> +svm_vmsave_pa(nv->nv_n1vmcx_pa);
>  
>  /* Cache guest physical address of virtual vmcb
>   * for VMCB Cleanbit emulation.
> diff --git a/xen/include/asm-x86/hvm/svm/svm.h 
> b/xen/include/asm-x86/hvm/svm/svm.h
> index 6e688a8..16a994e 100644
> --- a/xen/include/asm-x86/hvm/svm/svm.h
> +++ b/xen/include/asm-x86/hvm/svm/svm.h
> @@ -22,9 +22,6 @@
>  
>  #include 
>  
> -#define svm_vmload(x) svm_vmload_pa(__pa(x))
> -#define svm_vmsave(x) svm_vmsave_pa(__pa(x))
> -
>  static inline void svm_vmload_pa(paddr_t vmcb)
>  {
>  asm volatile (
> -- 
> 2.1.4
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] x86/AMD: make C-state handling independent of Dom0

2019-06-21 Thread Woods, Brian

On Fri, Jun 21, 2019 at 08:56:22AM -0600, Jan Beulich wrote:
> >>> On 21.06.19 at 16:29,  wrote:
> > On Fri, Jun 21, 2019 at 12:37:47AM -0600, Jan Beulich wrote:
> >> >>> On 19.06.19 at 17:54,  wrote:
> >> > On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
> >> >> >>> On 18.06.19 at 19:22,  wrote:
> >> >> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >> >> >> >>> On 10.06.19 at 18:28,  wrote:
> >> >> >> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> >> >> >> TBD: We may want to verify that HLT indeed is configured to enter 
> >> >> >> >> CC6.
> >> >> >> > 
> >> >> >> > I can't actually spot anything which talks about HLT directly.  The
> >> >> >> > closest I can post is CFOH (cache flush on halt) which is an
> >> >> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
> >> >> >> > wording suggests that mwait would also take this path.
> >> >> >> 
> >> >> >> Well, I had come across a section describing how HLT can be
> >> >> >> configured to be the same action as the I/O port read from one
> >> >> >> of the three ports involved in C-state management
> >> >> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
> >> >> >> 
> >> >> >> As to MWAIT behaving the same, I don't think I can spot proof
> >> >> >> of your interpretation or proof of Brian's.
> >> >> > 
> >> >> > It's not really documented clearly.  I got my information from the HW
> >> >> > engineers.  I've already posted what information I know so I won't
> >> >> > repeat it.
> >> >> 
> >> >> At least a pointer to where you had stated this would have been
> >> >> nice. Iirc there's no promotion into CC6 in that case, in contrast
> >> >> to Andrew's reading of the doc.
> >> > 
> >> > _v1_patchset
> >> 
> >> Hmm, I've looked through the patch descriptions there again, but I
> >> can't find any explicit statement to the effect of there being no
> >> promotion into deeper states when using MWAIT.
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg02007.html 
> 
> Thanks. Yes, it may be implied from there, but to me it's still not
> explicit. Also recall that it was Andrew originally asking if any
> promotion from CC1 is possible. I'm fine with you telling me it's
> not, but Andrew may still want you pointing him at where this
> is written down.
> 
> > Since you're under NDA, I can send you the email I received from the HW
> > engineering but as a basic recap:
> > 
> > If the HW is configured to use CC6 for HLT (CC6 is enabled and some
> > other NDA bits which gets OR'd with firmware so you can only
> > functionally CC6 on HLT off, but can't make sure it's on), then the
> > flow is:
> > 1) HLT
> > 2) timer
> > 3) flush the caches etc
> > 4) CC6
> > 
> > This can be interrupted though.  The HW engineer said that while they
> > aren't the same (as IO based C-states), they end up at the same place.
> > 
> > The whole reason HLT was selected to be used in my patches is because
> > we can't look in the CST table from Xen and it's always safe to use,
> > even if CC6 is disabled in BIOS (which we can't tell).  At this point,
> > I'm repeating our conversion we had in my v1 patch set.  If you need
> > any further info, let me know.
> 
> Thanks, I recall all of this. I don't see though how it's related to the
> question of whether the CPU would really remain in C1 when using
> MWAIT (i.e. going back to Andrew's original finding of promotion from
> CC1 to CC6). Now I do realize that C1 != CC1, but this doesn't help
> clarifying things in any way.
> 
> Jan
> 
> 

Note: this is for Naples and Rome only.

I was answering the HLT question.  But mwait can ONLY be used for
CC1/C1 since we don't support using mwait for CC6/C2 since it shuts
down the cache and mwait monitors that.  There is no promotion from
C1/CC1 to C2/CC6 with mwait since it would lose it's method of waking
up.  When you entry C1/CC1 using mwait, it stays in C1/CC1.  I will
email a HW engineer confirming this but I'd be extremely surprised it
you could be promoted from C1/CC6 to C2/CC6 when using mwait.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] x86/AMD: make C-state handling independent of Dom0

2019-06-21 Thread Woods, Brian

On Fri, Jun 21, 2019 at 12:37:47AM -0600, Jan Beulich wrote:
> >>> On 19.06.19 at 17:54,  wrote:
> > On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
> >> >>> On 18.06.19 at 19:22,  wrote:
> >> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >> >> >>> On 10.06.19 at 18:28,  wrote:
> >> >> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> >> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find 
> >> >> >> any
> >> >> >>  statement in the BKDG / PPR as to whether the LAPIC timer 
> >> >> >> continues
> >> >> >>  running in CC6.
> >> >> > 
> >> >> > This ought to be easy to determine.  Given the description of CC6
> >> >> > flushing the cache and power gating the core, I'd say there is a
> >> >> > reasonable chance that the LAPIC timer stops in CC6.
> >> >> 
> >> >> But "reasonable chance" isn't enough for my taste here. And from
> >> >> what you deduce, the answer to the question would be "no", and
> >> >> hence simply no change to be made anywhere. (I do think though
> >> >> that it's more complicated than this, because iirc much also depends
> >> >> on what the firmware actually does.)
> >> > 
> >> > The LAPIC timer never stops on the currently platforms (Naples and
> >> > Rome).  This is a knowledgable HW engineer so.
> >> 
> >> Thanks - I've taken note to set the variable accordingly then.
> >> 
> >> >> >> TBD: We may want to verify that HLT indeed is configured to enter 
> >> >> >> CC6.
> >> >> > 
> >> >> > I can't actually spot anything which talks about HLT directly.  The
> >> >> > closest I can post is CFOH (cache flush on halt) which is an
> >> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
> >> >> > wording suggests that mwait would also take this path.
> >> >> 
> >> >> Well, I had come across a section describing how HLT can be
> >> >> configured to be the same action as the I/O port read from one
> >> >> of the three ports involved in C-state management
> >> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
> >> >> 
> >> >> As to MWAIT behaving the same, I don't think I can spot proof
> >> >> of your interpretation or proof of Brian's.
> >> > 
> >> > It's not really documented clearly.  I got my information from the HW
> >> > engineers.  I've already posted what information I know so I won't
> >> > repeat it.
> >> 
> >> At least a pointer to where you had stated this would have been
> >> nice. Iirc there's no promotion into CC6 in that case, in contrast
> >> to Andrew's reading of the doc.
> > 
> > _v1_patchset
> 
> Hmm, I've looked through the patch descriptions there again, but I
> can't find any explicit statement to the effect of there being no
> promotion into deeper states when using MWAIT.
> 
> Jan

https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg02007.html

Since you're under NDA, I can send you the email I received from the HW
engineering but as a basic recap:

If the HW is configured to use CC6 for HLT (CC6 is enabled and some
other NDA bits which gets OR'd with firmware so you can only
functionally CC6 on HLT off, but can't make sure it's on), then the
flow is:
1) HLT
2) timer
3) flush the caches etc
4) CC6

This can be interrupted though.  The HW engineer said that while they
aren't the same (as IO based C-states), they end up at the same place.

The whole reason HLT was selected to be used in my patches is because
we can't look in the CST table from Xen and it's always safe to use,
even if CC6 is disabled in BIOS (which we can't tell).  At this point,
I'm repeating our conversion we had in my v1 patch set.  If you need
any further info, let me know.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] AMD/IOMMU: revert "amd/iommu: assign iommu devices to Xen"

2019-06-19 Thread Woods, Brian

On Mon, Jun 03, 2019 at 07:00:25AM -0600, Jan Beulich wrote:
> [CAUTION: External Email]
> 
> This reverts commit b6bd02b7a877f9fac2de69e64d8245d56f92ab25. The change
> was redundant with amd_iommu_detect_one_acpi() already calling
> pci_ro_device().
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1021,8 +1021,6 @@ static void * __init allocate_ppr_log(st
> 
>  static int __init amd_iommu_init_one(struct amd_iommu *iommu)
>  {
> -pci_hide_device(iommu->seg, PCI_BUS(iommu->bdf), PCI_DEVFN2(iommu->bdf));
> -
>  if ( map_iommu_mmio_region(iommu) != 0 )
>  goto error_out;
> 
> 
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/svm: Fix svm_vmcb_dump() when used in current context

2019-06-19 Thread Woods, Brian

On Mon, Jun 17, 2019 at 01:54:39PM +0100, Andy Cooper wrote:
> VMExit doesn't switch all state.  The FS/GS/TS/LDTR/GSBASE segment
> information, and SYSCALL/SYSENTER MSRs may still be cached in hardware, rather
> than up-to-date in the VMCB.
> 
> Export svm_sync_vmcb() via svmdebug.h so svm_vmcb_dump() can use it, and bring
> the VMCB into sync in current context.
> 
> As a minor optimisation, switch svm_sync_vmcb() to use svm_vm{load,save}_pa(),
> as svm->vmcb_pa is always in correct, and this avoids a redundant __pa()
> translation behind the scenes.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> ---
>  xen/arch/x86/hvm/svm/svm.c | 6 +++---
>  xen/arch/x86/hvm/svm/svmdebug.c| 9 +
>  xen/include/asm-x86/hvm/svm/svmdebug.h | 1 +
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index cd6a6b3..0eac9ce 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -627,21 +627,21 @@ static void svm_cpuid_policy_changed(struct vcpu *v)
>cp->extd.ibpb ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW);
>  }
>  
> -static void svm_sync_vmcb(struct vcpu *v, enum vmcb_sync_state new_state)
> +void svm_sync_vmcb(struct vcpu *v, enum vmcb_sync_state new_state)
>  {
>  struct svm_vcpu *svm = >arch.hvm.svm;
>  
>  if ( new_state == vmcb_needs_vmsave )
>  {
>  if ( svm->vmcb_sync_state == vmcb_needs_vmload )
> -svm_vmload(svm->vmcb);
> +svm_vmload_pa(svm->vmcb_pa);
>  
>  svm->vmcb_sync_state = new_state;
>  }
>  else
>  {
>  if ( svm->vmcb_sync_state == vmcb_needs_vmsave )
> -svm_vmsave(svm->vmcb);
> +svm_vmsave_pa(svm->vmcb_pa);
>  
>  if ( svm->vmcb_sync_state != vmcb_needs_vmload )
>  svm->vmcb_sync_state = new_state;
> diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
> index d35e405..4293d8d 100644
> --- a/xen/arch/x86/hvm/svm/svmdebug.c
> +++ b/xen/arch/x86/hvm/svm/svmdebug.c
> @@ -29,6 +29,15 @@ static void svm_dump_sel(const char *name, const struct 
> segment_register *s)
>  
>  void svm_vmcb_dump(const char *from, const struct vmcb_struct *vmcb)
>  {
> +struct vcpu *curr = current;
> +
> +/*
> + * If we are dumping the VMCB currently in context, some guest state may
> + * still be cached in hardware.  Retrieve it.
> + */
> +if ( vmcb == curr->arch.hvm.svm.vmcb )
> +svm_sync_vmcb(curr, vmcb_in_sync);
> +
>  printk("Dumping guest's current state at %s...\n", from);
>  printk("Size of VMCB = %zu, paddr = %"PRIpaddr", vaddr = %p\n",
> sizeof(struct vmcb_struct), virt_to_maddr(vmcb), vmcb);
> diff --git a/xen/include/asm-x86/hvm/svm/svmdebug.h 
> b/xen/include/asm-x86/hvm/svm/svmdebug.h
> index 658cdd3..330c1d9 100644
> --- a/xen/include/asm-x86/hvm/svm/svmdebug.h
> +++ b/xen/include/asm-x86/hvm/svm/svmdebug.h
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  
> +void svm_sync_vmcb(struct vcpu *v, enum vmcb_sync_state new_state);
>  void svm_vmcb_dump(const char *from, const struct vmcb_struct *vmcb);
>  bool svm_vmcb_isvalid(const char *from, const struct vmcb_struct *vmcb,
>const struct vcpu *v, bool verbose);
> -- 
> 2.1.4
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86: init_hypercall_page() cleanup

2019-06-19 Thread Woods, Brian

On Thu, May 23, 2019 at 11:20:15AM +0100, Andy Cooper wrote:
> [CAUTION: External Email]
> 
> The various pieces of the hypercall page infrastructure have grown
> organically over time and ended up in a bit of a mess.
> 
>  * Rename all functions to be of the form *_init_hypercall_page().  This
>makes them somewhat shorter, and means they can actually be grepped
>for in one go.
>  * Move init_hypercall_page() to domain.c.  The 64-bit traps.c isn't a
>terribly appropriate place for it to live.
>  * Drop an obsolete comment from hvm_init_hypercall_page() and drop the
>domain parameter from hvm_funcs.init_hypercall_page() as it isn't
>necessary.
>  * Rearrange the logic in the each function to avoid needing extra local
>variables, and to write the page in one single pass.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Jun Nakajima 
> CC: Kevin Tian 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> ---
>  xen/arch/x86/domain.c   | 14 +
>  xen/arch/x86/domctl.c   |  2 +-
>  xen/arch/x86/hvm/hvm.c  |  8 ++
>  xen/arch/x86/hvm/svm/svm.c  | 18 ++--
>  xen/arch/x86/hvm/vmx/vmx.c  | 18 ++--
>  xen/arch/x86/pv/dom0_build.c|  3 +-
>  xen/arch/x86/pv/hypercall.c | 63 
> -
>  xen/arch/x86/traps.c|  2 +-
>  xen/arch/x86/x86_64/traps.c | 13 -
>  xen/include/asm-x86/domain.h|  2 +-
>  xen/include/asm-x86/hvm/hvm.h   |  4 +--
>  xen/include/asm-x86/hypercall.h |  4 +--
>  12 files changed, 73 insertions(+), 78 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index ac960dd..9485a17 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -175,6 +175,20 @@ static void noreturn continue_idle_domain(struct vcpu *v)
>  reset_stack_and_jump(idle_loop);
>  }
> 
> +void init_hypercall_page(struct domain *d, void *ptr)
> +{
> +memset(ptr, 0xcc, PAGE_SIZE);
> +
> +if ( is_hvm_domain(d) )
> +hvm_init_hypercall_page(d, ptr);
> +else if ( is_pv_64bit_domain(d) )
> +pv_ring3_init_hypercall_page(ptr);
> +else if ( is_pv_32bit_domain(d) )
> +pv_ring1_init_hypercall_page(ptr);
> +else
> +ASSERT_UNREACHABLE();
> +}
> +
>  void dump_pageframe_info(struct domain *d)
>  {
>  struct page_info *page;
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 9bf2d08..7c6b809 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -517,7 +517,7 @@ long arch_do_domctl(
>  }
> 
>  hypercall_page = __map_domain_page(page);
> -hypercall_page_initialise(d, hypercall_page);
> +init_hypercall_page(d, hypercall_page);
>  unmap_domain_page(hypercall_page);
> 
>  put_page_and_type(page);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 8993c2a..5666286 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3801,13 +3801,11 @@ static void hvm_latch_shinfo_size(struct domain *d)
>  }
>  }
> 
> -/* Initialise a hypercall transfer page for a VMX domain using
> -   paravirtualised drivers. */
> -void hvm_hypercall_page_initialise(struct domain *d,
> -   void *hypercall_page)
> +void hvm_init_hypercall_page(struct domain *d, void *ptr)
>  {
>  hvm_latch_shinfo_size(d);
> -alternative_vcall(hvm_funcs.init_hypercall_page, d, hypercall_page);
> +
> +alternative_vcall(hvm_funcs.init_hypercall_page, ptr);
>  }
> 
>  void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip)
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 9f26493..cd6a6b3 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -916,17 +916,20 @@ static unsigned int svm_get_insn_bytes(struct vcpu *v, 
> uint8_t *buf)
>  return len;
>  }
> 
> -static void svm_init_hypercall_page(struct domain *d, void *hypercall_page)
> +static void svm_init_hypercall_page(void *p)
>  {
> -char *p;
> -int i;
> +unsigned int i;
> 
> -for ( i = 0; i < (PAGE_SIZE / 32); i++ )
> +for ( i = 0; i < (PAGE_SIZE / 32); i++, p += 32 )
>  {
> -if ( i == __HYPERVISOR_iret )
> +if ( unlikely(i == __HYPERVISOR_iret) )
> +{
> +/* HYPERVISOR_iret isn't supported */
> +*(u16 *)p = 0x0b0f; /* ud2 */
> +
>  continue;
> +}
> 
> -p = (char *)(hypercall_page + (i * 32));
>  *(u8  *)(p + 0) = 0xb8; /* mov imm32, %eax */
>  *(u32 *)(p + 1) = i;
>  *(u8  *)(p + 5) = 0x0f; /* vmmcall */
> @@ -934,9 +937,6 @@ static void svm_init_hypercall_page(struct domain *d, 
> void *hypercall_page)
>  *(u8  *)(p + 7) = 0xd9;
>  *(u8  *)(p + 8) = 0xc3; /* ret */
>  }
> -
> -/* Don't support

Re: [Xen-devel] [PATCH v3 13/13] print: introduce a format specifier for pci_sbdf_t

2019-06-19 Thread Woods, Brian

On Fri, Jun 07, 2019 at 11:22:32AM +0200, Roger Pau Monne wrote:
> The new format specifier is '%pp', and prints a pci_sbdf_t using the
> seg:bus:dev.func format. Replace all SBDFs printed using
> '%04x:%02x:%02x.%u' to use the new format specifier.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné 

As far as AMD IOMMU

Acked-by: Brian Woods  ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Kevin Tian 
> ---
> Changes since v1:
>  - Use base 8 to print the function number.
>  - Sort the addition in the pointer function alphabetically.
> ---
>  docs/misc/printk-formats.txt|   5 +
>  xen/arch/x86/hvm/vmsi.c |  10 +-
>  xen/arch/x86/msi.c  |  35 +++---
>  xen/common/vsprintf.c   |  18 
>  xen/drivers/passthrough/amd/iommu_acpi.c|  17 ++-
>  xen/drivers/passthrough/amd/iommu_cmd.c |   5 +-
>  xen/drivers/passthrough/amd/iommu_detect.c  |   5 +-
>  xen/drivers/passthrough/amd/iommu_init.c|  12 +--
>  xen/drivers/passthrough/amd/iommu_intr.c|   8 +-
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  25 ++---
>  xen/drivers/passthrough/pci.c   | 114 
>  xen/drivers/passthrough/vtd/dmar.c  |  25 +++--
>  xen/drivers/passthrough/vtd/intremap.c  |  11 +-
>  xen/drivers/passthrough/vtd/iommu.c |  80 ++
>  xen/drivers/passthrough/vtd/quirks.c|  22 ++--
>  xen/drivers/passthrough/vtd/utils.c |   6 +-
>  xen/drivers/passthrough/x86/ats.c   |  13 +--
>  xen/drivers/vpci/header.c   |  11 +-
>  xen/drivers/vpci/msi.c  |   6 +-
>  xen/drivers/vpci/msix.c |  24 ++---
>  20 files changed, 190 insertions(+), 262 deletions(-)
> 
> diff --git a/docs/misc/printk-formats.txt b/docs/misc/printk-formats.txt
> index 080f498f65..8f666f696a 100644
> --- a/docs/misc/printk-formats.txt
> +++ b/docs/misc/printk-formats.txt
> @@ -48,3 +48,8 @@ Domain and vCPU information:
> The domain part as above, with the vcpu_id printed in decimal.
>   e.g.  d0v1
> d[IDLE]v0
> +
> +PCI:
> +
> +   %pp PCI device address in S:B:D.F format from a pci_sbdf_t.
> + e.g.  0004:02:00.0
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index aeb5a70104..7290bd553d 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -686,10 +686,8 @@ static int vpci_msi_update(const struct pci_dev *pdev, 
> uint32_t data,
>  
>  if ( rc )
>  {
> -gdprintk(XENLOG_ERR,
> - "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
> - pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> - PCI_FUNC(pdev->devfn), pirq + i, rc);
> +gdprintk(XENLOG_ERR, "%pp: failed to bind PIRQ %u: %d\n",
> + >sbdf, pirq + i, rc);
>  while ( bind.machine_irq-- > pirq )
>  pt_irq_destroy_bind(pdev->domain, );
>  return rc;
> @@ -743,9 +741,7 @@ static int vpci_msi_enable(const struct pci_dev *pdev, 
> uint32_t data,
> _info);
>  if ( rc )
>  {
> -gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> - pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> - PCI_FUNC(pdev->devfn), rc);
> +gdprintk(XENLOG_ERR, "%pp: failed to map PIRQ: %d\n", >sbdf, 
> rc);
>  return rc;
>  }
>  
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 9a1ce33b42..3726394f02 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -428,8 +428,8 @@ static bool msi_set_mask_bit(struct irq_desc *desc, bool 
> host, bool guest)
>  {
>  pdev->msix->warned = domid;
>  printk(XENLOG_G_WARNING
> -   "cannot mask IRQ %d: masking MSI-X on Dom%d's 
> %04x:%02x:%02x.%u\n",
> -   desc->irq, domid, seg, bus, slot, func);
> +   "cannot mask IRQ %d: masking MSI-X on Dom%d's %pp\n",
> +   desc->irq, domid, >sbdf);
>  }
>  }
>  pdev->msix->host_maskall = maskall;
> @@ -987,11 +987,11 @@ static int msix_capability_init(struct pci_dev *dev,
>  struct domain *d = dev->domain ?: currd;
>  
>  if ( !is_hardware_domain(currd) || d != currd )
> -printk("%s use of MSI-X on %04x:%02x:%02x.%u by Dom%d\n",
> +printk("%s use of MSI-X on %pp by %pd\n",
> is_hardware_domain(currd)
> ? XENLOG_WARNING "Potentially insecure"
> :

Re: [Xen-devel] [PATCH v3 12/13] pci: switch pci_conf_write32 to use pci_sbdf_t

2019-06-19 Thread Woods, Brian

On Fri, Jun 07, 2019 at 11:22:31AM +0200, Roger Pau Monne wrote:
> This reduces the number of parameters of the function to two, and
> simplifies some of the calling sites.
> 
> Signed-off-by: Roger Pau Monné 

As far as AMD IOMMU

Acked-by: Brian Woods  ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Kevin Tian 
> ---
>  xen/arch/x86/cpu/amd.c   |  4 ++--
>  xen/arch/x86/msi.c   | 12 
>  xen/arch/x86/oprofile/op_model_athlon.c  |  4 +++-
>  xen/arch/x86/x86_64/pci.c| 17 -
>  xen/drivers/char/ehci-dbgp.c |  5 +++--
>  xen/drivers/char/ns16550.c   | 22 --
>  xen/drivers/passthrough/amd/iommu_init.c |  8 
>  xen/drivers/passthrough/pci.c|  8 
>  xen/drivers/passthrough/vtd/quirks.c |  8 
>  xen/drivers/vpci/header.c|  7 +++
>  xen/drivers/vpci/vpci.c  |  2 +-
>  xen/include/xen/pci.h|  4 +---
>  12 files changed, 45 insertions(+), 56 deletions(-)
> 
> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
> index 2e6529fba3..86273b6a07 100644
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -707,11 +707,11 @@ static void init_amd(struct cpuinfo_x86 *c)
>  (h & 0x1) ? "clearing D18F3x5C[0]" : "");
>  
>   if (l & 0x1f)
> - pci_conf_write32(0, 0, 0x18, 3, 0x58,
> + pci_conf_write32(PCI_SBDF(0, 0, 0x18, 3), 0x58,
>l & ~0x1f);
>  
>   if (h & 0x1)
> - pci_conf_write32(0, 0, 0x18, 3, 0x5c,
> + pci_conf_write32(PCI_SBDF(0, 0, 0x18, 3), 0x5c,
>h & ~0x1);
>   }
>  
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index cbc1e3b3f0..9a1ce33b42 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -251,21 +251,17 @@ static int write_msi_msg(struct msi_desc *entry, struct 
> msi_msg *msg)
>  {
>  struct pci_dev *dev = entry->dev;
>  int pos = entry->msi_attrib.pos;
> -u16 seg = dev->seg;
> -u8 bus = dev->bus;
> -u8 slot = PCI_SLOT(dev->devfn);
> -u8 func = PCI_FUNC(dev->devfn);
>  int nr = entry->msi_attrib.entry_nr;
>  
>  ASSERT((msg->data & (entry[-nr].msi.nvec - 1)) == nr);
>  if ( nr )
>  return 0;
>  
> -pci_conf_write32(seg, bus, slot, func, msi_lower_address_reg(pos),
> +pci_conf_write32(dev->sbdf, msi_lower_address_reg(pos),
>   msg->address_lo);
>  if ( entry->msi_attrib.is_64 )
>  {
> -pci_conf_write32(seg, bus, slot, func, 
> msi_upper_address_reg(pos),
> +pci_conf_write32(dev->sbdf, msi_upper_address_reg(pos),
>   msg->address_hi);
>  pci_conf_write16(dev->sbdf, msi_data_reg(pos, 1), msg->data);
>  }
> @@ -395,7 +391,7 @@ static bool msi_set_mask_bit(struct irq_desc *desc, bool 
> host, bool guest)
>  mask_bits = pci_conf_read32(pdev->sbdf, entry->msi.mpos);
>  mask_bits &= ~((u32)1 << entry->msi_attrib.entry_nr);
>  mask_bits |= (u32)flag << entry->msi_attrib.entry_nr;
> -pci_conf_write32(seg, bus, slot, func, entry->msi.mpos, 
> mask_bits);
> +pci_conf_write32(pdev->sbdf, entry->msi.mpos, mask_bits);
>  }
>  break;
>  case PCI_CAP_ID_MSIX:
> @@ -716,7 +712,7 @@ static int msi_capability_init(struct pci_dev *dev,
>  /* All MSIs are unmasked by default, Mask them all */
>  maskbits = pci_conf_read32(dev->sbdf, mpos);
>  maskbits |= ~(u32)0 >> (32 - maxvec);
> -pci_conf_write32(seg, bus, slot, func, mpos, maskbits);
> +pci_conf_write32(dev->sbdf, mpos, maskbits);
>  }
>  list_add_tail(>list, >msi_list);
>  
> diff --git a/xen/arch/x86/oprofile/op_model_athlon.c 
> b/xen/arch/x86/oprofile/op_model_athlon.c
> index 3bf0b0214d..5c48f868ae 100644
> --- a/xen/arch/x86/oprofile/op_model_athlon.c
> +++ b/xen/arch/x86/oprofile/op_model_athlon.c
> @@ -472,7 +472,9 @@ static int __init init_ibs_nmi(void)
>   if ((vendor_id == PCI_VENDOR_ID_AMD) &&
>   (dev_id == 
> PCI_DEVICE_ID_AMD_10H_NB_MISC)) {
>  
> - pci_conf_write32(0, bus, dev, func, 
> IBSCTL,
> + pci_conf_write32(
> + PCI_SBDF(0, bus, dev, func),
> +

Re: [Xen-devel] [PATCH v3 09/13] pci: switch pci_conf_read32 to use pci_sbdf_t

2019-06-19 Thread Woods, Brian

On Fri, Jun 07, 2019 at 11:22:28AM +0200, Roger Pau Monne wrote:
> This reduces the number of parameters of the function to two, and
> simplifies some of the calling sites.
> 
> Signed-off-by: Roger Pau Monné 

As far as AMD IOMMU

Acked-by: Brian Woods  ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Kevin Tian 
> ---
>  xen/arch/x86/cpu/amd.c |  7 +++--
>  xen/arch/x86/mm.c  |  2 +-
>  xen/arch/x86/msi.c | 28 ++---
>  xen/arch/x86/oprofile/op_model_athlon.c|  6 ++--
>  xen/arch/x86/x86_64/mmconf-fam10h.c|  8 +++--
>  xen/arch/x86/x86_64/mmconfig-shared.c  | 12 
>  xen/arch/x86/x86_64/pci.c  | 27 +++-
>  xen/drivers/char/ehci-dbgp.c   | 20 +++-
>  xen/drivers/char/ns16550.c | 18 ++-
>  xen/drivers/passthrough/amd/iommu_detect.c |  2 +-
>  xen/drivers/passthrough/amd/iommu_init.c   |  4 +--
>  xen/drivers/passthrough/pci.c  | 15 -
>  xen/drivers/passthrough/vtd/quirks.c   | 36 --
>  xen/drivers/pci/pci.c  |  4 +--
>  xen/drivers/vpci/header.c  |  6 ++--
>  xen/drivers/vpci/msix.c|  6 ++--
>  xen/drivers/vpci/vpci.c|  5 ++-
>  xen/include/xen/pci.h  |  4 +--
>  18 files changed, 101 insertions(+), 109 deletions(-)
> 
> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
> index 3c069391f4..37f60c0862 100644
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -417,7 +417,8 @@ static void disable_c1_ramping(void)
>   int node, nr_nodes;
>  
>   /* Read the number of nodes from the first Northbridge. */
> - nr_nodes = ((pci_conf_read32(0, 0, 0x18, 0x0, 0x60)>>4)&0x07)+1;
> + nr_nodes = ((pci_conf_read32(PCI_SBDF(0, 0, 0x18, 0), 0x60) >> 4) &
> + 0x07) + 1;
>   for (node = 0; node < nr_nodes; node++) {
>   /* PMM7: bus=0, dev=0x18+node, function=0x3, register=0x87. */
>   pmm7 = pci_conf_read8(PCI_SBDF(0, 0, 0x18 + node, 3), 0x87);
> @@ -696,8 +697,8 @@ static void init_amd(struct cpuinfo_x86 *c)
>  
>   if (c->x86 == 0x16 && c->x86_model <= 0xf) {
>   if (c == _cpu_data) {
> - l = pci_conf_read32(0, 0, 0x18, 0x3, 0x58);
> - h = pci_conf_read32(0, 0, 0x18, 0x3, 0x5c);
> + l = pci_conf_read32(PCI_SBDF(0, 0, 0x18, 3), 0x58);
> + h = pci_conf_read32(PCI_SBDF(0, 0, 0x18, 3), 0x5c);
>   if ((l & 0x1f) | (h & 0x1))
>   printk(KERN_WARNING
>  "Applying workaround for erratum 792: 
> %s%s%s\n",
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index df2c0130f1..e67119dbe6 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5949,7 +5949,7 @@ const struct platform_bad_page *__init 
> get_platform_badpages(unsigned int *array
>  }
>  
>  *array_size = ARRAY_SIZE(snb_bad_pages);
> -igd_id = pci_conf_read32(0, 0, 2, 0, 0);
> +igd_id = pci_conf_read32(PCI_SBDF(0, 0, 2, 0), 0);
>  if ( IS_SNB_GFX(igd_id) )
>  return snb_bad_pages;
>  
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index ed986261c3..392cbecfe4 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -191,16 +191,13 @@ static bool read_msi_msg(struct msi_desc *entry, struct 
> msi_msg *msg)
>  {
>  struct pci_dev *dev = entry->dev;
>  int pos = entry->msi_attrib.pos;
> -u16 data, seg = dev->seg;
> -u8 bus = dev->bus;
> -u8 slot = PCI_SLOT(dev->devfn);
> -u8 func = PCI_FUNC(dev->devfn);
> +uint16_t data;
>  
> -msg->address_lo = pci_conf_read32(seg, bus, slot, func,
> +msg->address_lo = pci_conf_read32(dev->sbdf,
>msi_lower_address_reg(pos));
>  if ( entry->msi_attrib.is_64 )
>  {
> -msg->address_hi = pci_conf_read32(seg, bus, slot, func,
> +msg->address_hi = pci_conf_read32(dev->sbdf,
>msi_upper_address_reg(pos));
>  data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 1));
>  }
> @@ -396,7 +393,7 @@ static bool msi_set_mask_bit(struct irq_desc *desc, bool 
> host, bool guest)
>  {
>  u32 mask_bits;
>  
> -mask_bits = pci_conf_read32(seg, bus, slot, func, 
> entry->msi.mpos);
> +mask_bits = pci_conf_read32(pdev->sbdf, entry->msi.mpos);
>  mask_bits &= ~((u32)1 << entry->msi_attrib.entry_nr);
>  mask_bits |= (u32)flag <<

Re: [Xen-devel] [PATCH v3 08/13] pci: switch pci_conf_read16 to use pci_sbdf_t

2019-06-19 Thread Woods, Brian

On Fri, Jun 07, 2019 at 11:22:27AM +0200, Roger Pau Monne wrote:
> This reduces the number of parameters of the function to two, and
> simplifies some of the calling sites.
> 
> Signed-off-by: Roger Pau Monné 

As far as AMD IOMMU

Acked-by: Brian Woods  ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Kevin Tian 
> ---
>  xen/arch/x86/dmi_scan.c  |  6 +-
>  xen/arch/x86/msi.c   | 73 ++--
>  xen/arch/x86/x86_64/mmconfig-shared.c|  2 +-
>  xen/arch/x86/x86_64/pci.c| 27 -
>  xen/drivers/char/ehci-dbgp.c |  5 +-
>  xen/drivers/char/ns16550.c   | 16 --
>  xen/drivers/passthrough/amd/iommu_init.c |  3 +-
>  xen/drivers/passthrough/ats.h|  4 +-
>  xen/drivers/passthrough/pci.c| 40 +
>  xen/drivers/passthrough/vtd/quirks.c |  9 ++-
>  xen/drivers/passthrough/x86/ats.c|  9 +--
>  xen/drivers/pci/pci.c|  4 +-
>  xen/drivers/video/vga.c  |  8 +--
>  xen/drivers/vpci/header.c| 11 ++--
>  xen/drivers/vpci/msi.c   |  3 +-
>  xen/drivers/vpci/msix.c  |  3 +-
>  xen/drivers/vpci/vpci.c  | 11 ++--
>  xen/include/xen/pci.h|  4 +-
>  18 files changed, 99 insertions(+), 139 deletions(-)
> 
> diff --git a/xen/arch/x86/dmi_scan.c b/xen/arch/x86/dmi_scan.c
> index fcdf2d3952..31caad133e 100644
> --- a/xen/arch/x86/dmi_scan.c
> +++ b/xen/arch/x86/dmi_scan.c
> @@ -469,15 +469,15 @@ static int __init ich10_bios_quirk(struct dmi_system_id 
> *d)
>  {
>  u32 port, smictl;
>  
> -if ( pci_conf_read16(0, 0, 0x1f, 0, PCI_VENDOR_ID) != 0x8086 )
> +if ( pci_conf_read16(PCI_SBDF(0, 0, 0x1f, 0), PCI_VENDOR_ID) != 0x8086 )
>  return 0;
>  
> -switch ( pci_conf_read16(0, 0, 0x1f, 0, PCI_DEVICE_ID) ) {
> +switch ( pci_conf_read16(PCI_SBDF(0, 0, 0x1f, 0), PCI_DEVICE_ID) ) {
>  case 0x3a14:
>  case 0x3a16:
>  case 0x3a18:
>  case 0x3a1a:
> -port = (pci_conf_read16(0, 0, 0x1f, 0, 0x40) & 0xff80) + 0x30;
> +port = (pci_conf_read16(PCI_SBDF(0, 0, 0x1f, 0), 0x40) & 0xff80) + 
> 0x30;
>  smictl = inl(port);
>  /* turn off LEGACY_USB{,2}_EN if enabled */
>  if ( smictl & 0x20008 )
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 67339edc68..ed986261c3 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -124,29 +124,20 @@ static void msix_put_fixmap(struct arch_msix *msix, int 
> idx)
>  
>  static bool memory_decoded(const struct pci_dev *dev)
>  {
> -u8 bus, slot, func;
> +pci_sbdf_t sbdf = dev->sbdf;
>  
> -if ( !dev->info.is_virtfn )
> -{
> -bus = dev->bus;
> -slot = PCI_SLOT(dev->devfn);
> -func = PCI_FUNC(dev->devfn);
> -}
> -else
> +if ( dev->info.is_virtfn )
>  {
> -bus = dev->info.physfn.bus;
> -slot = PCI_SLOT(dev->info.physfn.devfn);
> -func = PCI_FUNC(dev->info.physfn.devfn);
> +sbdf.bus = dev->info.physfn.bus;
> +sbdf.devfn = dev->info.physfn.devfn;
>  }
>  
> -return !!(pci_conf_read16(dev->seg, bus, slot, func, PCI_COMMAND) &
> -  PCI_COMMAND_MEMORY);
> +return pci_conf_read16(sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY;
>  }
>  
>  static bool msix_memory_decoded(const struct pci_dev *dev, unsigned int pos)
>  {
> -u16 control = pci_conf_read16(dev->seg, dev->bus, PCI_SLOT(dev->devfn),
> -  PCI_FUNC(dev->devfn), 
> msix_control_reg(pos));
> +uint16_t control = pci_conf_read16(dev->sbdf, msix_control_reg(pos));
>  
>  if ( !(control & PCI_MSIX_FLAGS_ENABLE) )
>  return false;
> @@ -211,14 +202,12 @@ static bool read_msi_msg(struct msi_desc *entry, struct 
> msi_msg *msg)
>  {
>  msg->address_hi = pci_conf_read32(seg, bus, slot, func,
>msi_upper_address_reg(pos));
> -data = pci_conf_read16(seg, bus, slot, func,
> -   msi_data_reg(pos, 1));
> +data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 1));
>  }
>  else
>  {
>  msg->address_hi = 0;
> -data = pci_conf_read16(seg, bus, slot, func,
> -   msi_data_reg(pos, 0));
> +data = pci_conf_read16(dev->sbdf, msi_data_reg(pos, 0));
>  }
>  msg->data = data;
>  break;
> @@ -337,7 +326,8 @@ void set_msi_affinity(struct irq_desc *desc, const 
> cpumask_t *mask)
>  
>  void __msi_set_enable(u16 seg, u8 bus, u8 slot, u8 func, int pos, int enable)
>  {
> -u16 control = pci_conf_read16(seg, bus, slot, func, pos +

Re: [Xen-devel] [PATCH v3 07/13] pci: switch pci_conf_read8 to use pci_sbdf_t

2019-06-19 Thread Woods, Brian

On Fri, Jun 07, 2019 at 11:22:26AM +0200, Roger Pau Monne wrote:
> This reduces the number of parameters of the function to two, and
> simplifies some of the calling sites.
> 
> Signed-off-by: Roger Pau Monné 

As far as AMD IOMMU

Acked-by: Brian Woods 

> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Kevin Tian 
> ---
>  xen/arch/x86/cpu/amd.c   |  4 ++--
>  xen/arch/x86/msi.c   |  2 +-
>  xen/arch/x86/x86_64/pci.c| 25 
>  xen/drivers/char/ehci-dbgp.c |  5 +++--
>  xen/drivers/char/ns16550.c   |  6 --
>  xen/drivers/passthrough/amd/iommu_init.c |  2 +-
>  xen/drivers/passthrough/pci.c| 21 
>  xen/drivers/passthrough/vtd/dmar.c   |  6 +++---
>  xen/drivers/passthrough/vtd/quirks.c |  6 +++---
>  xen/drivers/pci/pci.c|  9 -
>  xen/drivers/video/vga.c  |  3 +--
>  xen/drivers/vpci/header.c|  3 +--
>  xen/drivers/vpci/vpci.c  |  8 +++-
>  xen/include/xen/pci.h|  4 +---
>  14 files changed, 47 insertions(+), 57 deletions(-)
> 
> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
> index 8404cf290f..3c069391f4 100644
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -420,12 +420,12 @@ static void disable_c1_ramping(void)
>   nr_nodes = ((pci_conf_read32(0, 0, 0x18, 0x0, 0x60)>>4)&0x07)+1;
>   for (node = 0; node < nr_nodes; node++) {
>   /* PMM7: bus=0, dev=0x18+node, function=0x3, register=0x87. */
> - pmm7 = pci_conf_read8(0, 0, 0x18+node, 0x3, 0x87);
> + pmm7 = pci_conf_read8(PCI_SBDF(0, 0, 0x18 + node, 3), 0x87);
>   /* Invalid read means we've updated every Northbridge. */
>   if (pmm7 == 0xFF)
>   break;
>   pmm7 &= 0xFC; /* clear pmm7[1:0] */
> - pci_conf_write8(0, 0, 0x18+node, 0x3, 0x87, pmm7);
> + pci_conf_write8(0, 0, 0x18 + node, 0x3, 0x87, pmm7);
>   printk ("AMD: Disabling C1 Clock Ramping Node #%x\n", node);
>   }
>  }
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index babc4147c4..67339edc68 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -800,7 +800,7 @@ static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 
> func, u8 bir, int vf)
>  disp = vf * pdev->vf_rlen[bir];
>  limit = PCI_SRIOV_NUM_BARS;
>  }
> -else switch ( pci_conf_read8(seg, bus, slot, func,
> +else switch ( pci_conf_read8(PCI_SBDF(seg, bus, slot, func),
>   PCI_HEADER_TYPE) & 0x7f )
>  {
>  case PCI_HEADER_TYPE_NORMAL:
> diff --git a/xen/arch/x86/x86_64/pci.c b/xen/arch/x86/x86_64/pci.c
> index 6e3f5cf203..b70383fb03 100644
> --- a/xen/arch/x86/x86_64/pci.c
> +++ b/xen/arch/x86/x86_64/pci.c
> @@ -8,27 +8,26 @@
>  #include 
>  #include 
>  
> -#define PCI_CONF_ADDRESS(bus, dev, func, reg) \
> -(0x8000 | (bus << 16) | (dev << 11) | (func << 8) | (reg & ~3))
> +#define PCI_CONF_ADDRESS(sbdf, reg) \
> +(0x8000 | ((sbdf).bdf << 8) | ((reg) & ~3))
>  
> -uint8_t pci_conf_read8(
> -unsigned int seg, unsigned int bus, unsigned int dev, unsigned int func,
> -unsigned int reg)
> +uint8_t pci_conf_read8(pci_sbdf_t sbdf, unsigned int reg)
>  {
> -u32 value;
> +uint32_t value;
>  
> -if ( seg || reg > 255 )
> +if ( sbdf.seg || reg > 255 )
>  {
> -pci_mmcfg_read(seg, bus, PCI_DEVFN(dev, func), reg, 1, );
> +pci_mmcfg_read(sbdf.seg, sbdf.bus, sbdf.devfn, reg, 1, );
>  return value;
>  }
> -else
> -{
> -BUG_ON((bus > 255) || (dev > 31) || (func > 7));
> -return pci_conf_read(PCI_CONF_ADDRESS(bus, dev, func, reg), reg & 3, 
> 1);
> -}
> +
> +return pci_conf_read(PCI_CONF_ADDRESS(sbdf, reg), reg & 3, 1);
>  }
>  
> +#undef PCI_CONF_ADDRESS
> +#define PCI_CONF_ADDRESS(bus, dev, func, reg) \
> +(0x8000 | (bus << 16) | (dev << 11) | (func << 8) | (reg & ~3))
> +
>  uint16_t pci_conf_read16(
>  unsigned int seg, unsigned int bus, unsigned int dev, unsigned int func,
>  unsigned int reg)
> diff --git a/xen/drivers/char/ehci-dbgp.c b/xen/drivers/char/ehci-dbgp.c
> index 475dc41767..71f0aaa6ac 100644
> --- a/xen/drivers/char/ehci-dbgp.c
> +++ b/xen/drivers/char/ehci-dbgp.c
> @@ -713,7 +713,7 @@ static unsigned int __init find_dbgp(struct ehci_dbgp 
> *dbgp,
>  cap = __find_dbgp(bus, slot, func);
>  if ( !cap || ehci_num-- )
>  {
> -if ( !func && !(pci_conf_read8(0, bus, slot, func,
> +if ( !func && !(pci_conf_read8(PCI_SBDF(0, bus, slot, 
> func),
>

Re: [Xen-devel] [PATCH 3/5] x86/AMD: make C-state handling independent of Dom0

2019-06-19 Thread Woods, Brian

On Wed, Jun 19, 2019 at 12:20:40AM -0600, Jan Beulich wrote:
> >>> On 18.06.19 at 19:22,  wrote:
> > On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >> >>> On 10.06.19 at 18:28,  wrote:
> >> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
> >> >>  statement in the BKDG / PPR as to whether the LAPIC timer continues
> >> >>  running in CC6.
> >> > 
> >> > This ought to be easy to determine.  Given the description of CC6
> >> > flushing the cache and power gating the core, I'd say there is a
> >> > reasonable chance that the LAPIC timer stops in CC6.
> >> 
> >> But "reasonable chance" isn't enough for my taste here. And from
> >> what you deduce, the answer to the question would be "no", and
> >> hence simply no change to be made anywhere. (I do think though
> >> that it's more complicated than this, because iirc much also depends
> >> on what the firmware actually does.)
> > 
> > The LAPIC timer never stops on the currently platforms (Naples and
> > Rome).  This is a knowledgable HW engineer so.
> 
> Thanks - I've taken note to set the variable accordingly then.
> 
> >> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> >> > 
> >> > I can't actually spot anything which talks about HLT directly.  The
> >> > closest I can post is CFOH (cache flush on halt) which is an
> >> > auto-transition from CC1 to CC6 after a specific timeout, but the
> >> > wording suggests that mwait would also take this path.
> >> 
> >> Well, I had come across a section describing how HLT can be
> >> configured to be the same action as the I/O port read from one
> >> of the three ports involved in C-state management
> >> (CStateBaseAddr+0...2). But I can't seem to find this again.
> >> 
> >> As to MWAIT behaving the same, I don't think I can spot proof
> >> of your interpretation or proof of Brian's.
> > 
> > It's not really documented clearly.  I got my information from the HW
> > engineers.  I've already posted what information I know so I won't
> > repeat it.
> 
> At least a pointer to where you had stated this would have been
> nice. Iirc there's no promotion into CC6 in that case, in contrast
> to Andrew's reading of the doc.
> 
> Jan
> 

_v1_patchset

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] AMD/IOMMU: initialize IRQ tasklet only once

2019-06-18 Thread Woods, Brian

On Fri, May 31, 2019 at 09:52:04AM -0600, Jan Beulich wrote:
> [CAUTION: External Email]
> 
> Don't do this once per IOMMU, nor after setting up the IOMMU interrupt
> (which will want to schedule this tasklet). In fact it can be
> initialized at build time.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -31,7 +31,8 @@
> 
>  static int __initdata nr_amd_iommus;
> 
> -static struct tasklet amd_iommu_irq_tasklet;
> +static void do_amd_iommu_irq(unsigned long data);
> +static DECLARE_SOFTIRQ_TASKLET(amd_iommu_irq_tasklet, do_amd_iommu_irq, 0);
> 
>  unsigned int __read_mostly ivrs_bdf_entries;
>  u8 __read_mostly ivhd_type;
> @@ -1056,8 +1057,6 @@ static int __init amd_iommu_init_one(str
>  printk("AMD-Vi: IOMMU %d Enabled.\n", nr_amd_iommus );
>  nr_amd_iommus++;
> 
> -softirq_tasklet_init(_iommu_irq_tasklet, do_amd_iommu_irq, 0);
> -
>  return 0;
> 
>  error_out:
> 
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] x86/AMD: make C-state handling independent of Dom0

2019-06-18 Thread Woods, Brian

On Tue, Jun 11, 2019 at 06:42:33AM -0600, Jan Beulich wrote:
> >>> On 10.06.19 at 18:28,  wrote:
> > On 23/05/2019 13:18, Jan Beulich wrote:
> >> At least for more recent CPUs, following what BKDG / PPR suggest for the
> >> BIOS to surface via ACPI we can make ourselves independent of Dom0
> >> uploading respective data.
> >>
> >> Signed-off-by: Jan Beulich 
> >> ---
> >> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
> >>  statement in the BKDG / PPR as to whether the LAPIC timer continues
> >>  running in CC6.
> > 
> > This ought to be easy to determine.  Given the description of CC6
> > flushing the cache and power gating the core, I'd say there is a
> > reasonable chance that the LAPIC timer stops in CC6.
> 
> But "reasonable chance" isn't enough for my taste here. And from
> what you deduce, the answer to the question would be "no", and
> hence simply no change to be made anywhere. (I do think though
> that it's more complicated than this, because iirc much also depends
> on what the firmware actually does.)

The LAPIC timer never stops on the currently platforms (Naples and
Rome).  This is a knowledgable HW engineer so.

> >> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> > 
> > I can't actually spot anything which talks about HLT directly.  The
> > closest I can post is CFOH (cache flush on halt) which is an
> > auto-transition from CC1 to CC6 after a specific timeout, but the
> > wording suggests that mwait would also take this path.
> 
> Well, I had come across a section describing how HLT can be
> configured to be the same action as the I/O port read from one
> of the three ports involved in C-state management
> (CStateBaseAddr+0...2). But I can't seem to find this again.
> 
> As to MWAIT behaving the same, I don't think I can spot proof
> of your interpretation or proof of Brian's.

It's not really documented clearly.  I got my information from the HW
engineers.  I've already posted what information I know so I won't
repeat it.

> >> --- a/xen/arch/x86/acpi/cpu_idle.c
> >> +++ b/xen/arch/x86/acpi/cpu_idle.c
> >> @@ -1283,6 +1288,98 @@ long set_cx_pminfo(uint32_t acpi_id, str
> >>  return 0;
> >>  }
> >>  
> >> +static void amd_cpuidle_init(struct acpi_processor_power *power)
> >> +{
> >> +unsigned int i, nr = 0;
> >> +const struct cpuinfo_x86 *c = _cpu_data;
> >> +const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
> >> + CPUID5_ECX_INTERRUPT_BREAK;
> >> +const struct acpi_processor_cx *cx = NULL;
> >> +static const struct acpi_processor_cx fam17[] = {
> >> +{
> >> +.type = ACPI_STATE_C1,
> >> +.entry_method = ACPI_CSTATE_EM_FFH,
> >> +.address = 0,
> >> +.latency = 1,
> >> +},
> >> +{
> >> +.type = ACPI_STATE_C2,
> >> +.entry_method = ACPI_CSTATE_EM_HALT,
> >> +.latency = 400,
> >> +},
> >> +};
> >> +
> >> +if ( pm_idle_save && pm_idle != acpi_processor_idle )
> >> +return;
> >> +
> >> +if ( vendor_override < 0 )
> >> +return;
> >> +
> >> +switch ( c->x86 )
> >> +{
> >> +case 0x17:
> > 
> > With Hygon in the mix, this should be expanded to Fam18h.
> 
> But only once we get a guarantee from AMD that they won't use
> family 18h. Otherwise we'd have to use vendor checks here.
> Anyway this series predates the merging of the Hygon one. But
> yes, I can easily do this for v2.
> 
> Jan
> 
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/3] mwait support for AMD processors

2019-06-17 Thread Woods, Brian

On Thu, May 30, 2019 at 10:08:12PM -0400, Rich Persaud wrote:
> [CAUTION: External Email]
> On Mar 28, 2019, at 11:04, Woods, Brian 
> mailto:brian.wo...@amd.com>> wrote:
> 
> This patch series add support and enablement for mwait on AMD Naples
> and Rome processors.  Newer AMD processors support mwait, but only for
> c1, and for c2 halt is used.  The mwait-idle driver is modified to be
> able to use both mwait and halt for idling.
> 
> Would you mind if I create a Xen Project JIRA ticket, or a wiki page, to 
> track requirements and implementations related to this patch series?

You can, but I doubt this patch series will go anywhere since Jan was
completely opposed to adding this to the mwait-idle.c file since it
included halt for C2.  Since then, Jan has released some other patches
which have gotten reviews/comments so.

> From the initial thread [1]:
> On certain AMD families that support mwait, only c1 can be reached by
> + * mwait and to reach c2, halt has to be used.
> + */
> +#define CPUIDLE_FLAG_USE_HALT0x2
> 
> Could you point us at where in the manuals this behavior is described?
> While PM Vol 2 has a chapter talking about P-states, I can't seem to
> find any mention of C-states there.

Technically I should clairfy.  You can reach C1 and C2 via sysio and
acpi as well.  But mwait only uses C1.  Halt (after a timer and a
transition state), assuming C2 is enabled, does put the CPU in C2.

Sadly this isn't documented well, (even in the NDA docs), but the
documentation you'd be looking for is the NDA PPR.  Sadly the public
PPR doesn't include it.

> IIRC it's in the NDA PPR and internally it's in some other documents.
> We don't have support to use mwait while in CC6 due to caches being
> turned off etc.  If we did have mwait suport for CC6, we'd use that here
> (basically mirroring Intel).  Sadly I don't think we have any public
> information directly detailing this information.

None that I know of.

> Can this be documented in the patch comment, or an AMD-specific page on 
> wiki.xenproject.org<http://wiki.xenproject.org>?  It's a requirement/input to 
> all possible implementations.
> 
> From a comment in the April 2018 Linux patch by Yazen [2]:
> > x86/smpboot: Don't use mwait_play_dead() on AMD systems
> > Recent AMD systems support using MWAIT for C1 state. However, MWAIT will
> > not allow deeper cstates than C1 on current systems.
> >
> > play_dead() expects to use the deepest state available.  The deepest state
> > available on AMD systems is reached through SystemIO or HALT. If MWAIT is
> > available, it is preferred over the other methods, so the CPU never reaches
> > the deepest possible state.
> >
> > Don't try to use MWAIT to play_dead() on AMD systems. Instead, use CPUIDLE
> > to enter the deepest state advertised by firmware. If CPUIDLE is not
> > available then fallback to HALT.
> 
> For the ticket/wiki: what are the expected benefits of the proposed Xen 
> change?  Would it reduce idle power consumption on Ryzen 1000/2000/3000? Epyc 
> 3000/7000? Any sample data available for idle power before/after the v2 patch?

Since Xen uses HALT be default, it would be a performance feature,
since it would use HALT/C2 for ALL idle.  mwait has a much better
response time from being woken up (at the cost power).

> From a thread [3] posted by Jan this week, "x86/AMD: make C-state handling 
> independent of Dom0":
> > The 3rd patch is my counterproposal to Brian's intended abuse (as I would 
> > call it) of the mwait-idle driver.
> 
> Do we need a new, patch-independent, thread for convergence of candidate 
> implementations which address the requirements (documented in ticket/wiki)?  
> Should discussion move from the initial thread [1] to the counter-proposal 
> thread [3]?  Or this thread?
> 
> Rich

Yes, that's Jan's patch I was talking about before.  Glad to know the
cleanest solution with the least code duplication and a single path
for Intel and AMD was considered abuse.

> [1] https://lists.gt.net/xen/devel/545688
> 
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86-urgent-for-linus=da6fa7ef67f07108a1b0cb9fd9e7fcaabd39c051<https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86-urgent-for-linus=da6fa7ef67f07108a1b0cb9fd9e7fcaabd39c051_source=anz>
> 
> [3] https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg01894.html
> 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/9] AMD/IOMMU: use bit field for extended feature register

2019-06-17 Thread Woods, Brian

On Thu, Jun 13, 2019 at 07:22:31AM -0600, Jan Beulich wrote:
> This also takes care of several of the shift values wrongly having been
> specified as hex rather than dec.
> 
> Take the opportunity and add further fields.
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -60,43 +60,72 @@ static int __init get_iommu_capabilities
>  
>  void __init get_iommu_features(struct amd_iommu *iommu)
>  {
> -u32 low, high;
> -int i = 0 ;
> -static const char *__initdata feature_str[] = {
> -"- Prefetch Pages Command", 
> -"- Peripheral Page Service Request", 
> -"- X2APIC Supported", 
> -"- NX bit Supported", 
> -"- Guest Translation", 
> -"- Reserved bit [5]",
> -"- Invalidate All Command", 
> -"- Guest APIC supported", 
> -"- Hardware Error Registers", 
> -"- Performance Counters", 
> -NULL
> -};
> -
>  ASSERT( iommu->mmio_base );
>  
>  if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
>  {
> -iommu->features = 0;
> +iommu->features.raw = 0;
>  return;
>  }
>  
> -low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
> -high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
> -
> -iommu->features = ((u64)high << 32) | low;
> +iommu->features.raw =
> +readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
>  
>  printk("AMD-Vi: IOMMU Extended Features:\n");
>  
> -while ( feature_str[i] )
> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
> +#define FEAT(fld, str) do { \
> +if ( MASK(fld) & (MASK(fld) - 1) ) \
> +printk( "- " str ": %#x\n", iommu->features.flds.fld); \
> +else if ( iommu->features.raw & MASK(fld) ) \
> +printk( "- " str "\n"); \
> +} while ( false )
> +
> +FEAT(pref_sup,   "Prefetch Pages Command");
> +FEAT(ppr_sup,"Peripheral Page Service Request");
> +FEAT(xt_sup, "x2APIC");
> +FEAT(nx_sup, "NX bit");
> +FEAT(gappi_sup,  "Guest APIC Physical Processor Interrupt");
> +FEAT(ia_sup, "Invalidate All Command");
> +FEAT(ga_sup, "Guest APIC");
> +FEAT(he_sup, "Hardware Error Registers");
> +FEAT(pc_sup, "Performance Counters");
> +FEAT(hats,   "Host Address Translation Size");
> +
> +if ( iommu->features.flds.gt_sup )
>  {
> -if ( amd_iommu_has_feature(iommu, i) )
> -printk( " %s\n", feature_str[i]);
> -i++;
> +FEAT(gats,   "Guest Address Translation Size");
> +FEAT(glx_sup,"Guest CR3 Root Table Level");
> +FEAT(pas_max,"Maximum PASID");
>  }
> +
> +FEAT(smif_sup,   "SMI Filter Register");
> +FEAT(smif_rc,"SMI Filter Register Count");
> +FEAT(gam_sup,"Guest Virtual APIC Modes");
> +FEAT(dual_ppr_log_sup,   "Dual PPR Log");
> +FEAT(dual_event_log_sup, "Dual Event Log");
> +FEAT(sat_sup,"Secure ATS");
> +FEAT(us_sup, "User / Supervisor Page Protection");
> +FEAT(dev_tbl_seg_sup,"Device Table Segmentation");
> +FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
> +FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
> +FEAT(marc_sup,   "Memory Access Routing and Control");
> +FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
> +FEAT(perf_opt_sup ,  "Performance Optimization");
> +FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
> +FEAT(gio_sup,"Guest I/O Protection");
> +FEAT(ha_sup, "Host Access");
> +FEAT(eph_sup,"Enhanced PPR Handling");
> +FEAT(attr_fw_sup,"Attribute Forward");
> +FEAT(hd_sup, "Host Dirty");
> +FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
> +FEAT(viommu_sup, "Virtualized IOMMU");
> +FEAT(vm_guard_io_sup,"VMGuard I/O Support");
> +FEAT(vm_table_size,  "VM Table Size");
> +FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
> +
> +#undef FEAT
> +#undef MASK
>  }
>  
>  int __init amd_iommu_detect_one_acpi(
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -638,7 +638,7 @@ static uint64_t iommu_mmio_read64(struct
>  val = reg_to_u64(iommu->reg_status);
>  break;
>  case IOMMU_EXT_FEATURE_MMIO_OFFSET:
> -val = reg_to_u64(iommu->reg_ext_feature);
> +val = iommu->reg_ext_feature.raw;
>  break;
>  
>  default:
> @@ -802,39 +802,26 @@ int guest_iommu_set_base(struct domain *
>  /* Initialize mmio read only bits */
>  static void guest_iommu_reg_init(struct guest_iommu *iommu)
>  {
> -uint32_t lower, upper;
> +union amd_iommu_ext_features ef = {
>

Re: [Xen-devel] [PATCH] x86/svm: Drop support for AMD's Lightweight Profiling

2019-05-20 Thread Woods, Brian

On Mon, May 20, 2019 at 11:13:36AM +0100, Andy Cooper wrote:
> Lightweight Profiling was introduced in Bulldozer (Fam15h), but was dropped
> from Zen (Fam17h) processors.  Furthermore, LWP was dropped from Fam15/16 CPUs
> when IBPB for Spectre v2 was introduced in microcode, owing to LWP not being
> used in practice.
> 
> As a result, CPUs which are operating within specification (i.e. with up to
> date microcode) no longer have this feature, and therefore are not using it.
> 
> Drop support from Xen.  The main motivation here is to remove unnecessary
> complexity from CPUID handling, but it also tidies up the SVM code nicely.
> 
> Signed-off-by: Andrew Cooper 
Acked-by: Brian Woods 

I've confirmed with HW engineers that it's going away.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/3] mwait support for AMD processors

2019-05-09 Thread Woods, Brian

On Thu, Mar 28, 2019 at 03:04:32PM +, Brian Woods wrote:
> This patch series add support and enablement for mwait on AMD Naples
> and Rome processors.  Newer AMD processors support mwait, but only for
> c1, and for c2 halt is used.  The mwait-idle driver is modified to be
> able to use both mwait and halt for idling.
> 
> Brian Woods (3):
>   mwait-idle: add support for using halt
>   mwait-idle: add support for AMD processors
>   mwait-idle: add enablement for AMD Naples and Rome
> 
>  xen/arch/x86/acpi/cpu_idle.c  |  2 +-
>  xen/arch/x86/cpu/mwait-idle.c | 62 
> ++-
>  xen/include/asm-x86/cpuidle.h |  1 +
>  3 files changed, 57 insertions(+), 8 deletions(-)
> 
> -- 
> 2.11.0
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

Ping for Andy.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCE] Xen Project Community Call May 9th @15:00 UTC Call for agenda items

2019-05-06 Thread Woods, Brian

On Mon, May 06, 2019 at 07:51:17AM -0600, Lars Kurth wrote:
> [CAUTION: External Email]
> 
> Hi all,
> 
> Please propose topics by either editing the running agenda document at 
> https://docs.google.com/document/d/1ktN-5u8uScEvhf9N8Um5o6poF12lVEnnySHJw_7Jk8k/edit#
>  or by replying to the mail. Ideally by Wednesday!
> 
> Best Regards
> Lars
> 

I'd like to add the AMD mwait V2 patch set to the list of topics.  I'd
like to come to some sort of conclusion about that set.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] AMD/IOMMU: disable previously enabled IOMMUs upon init failure

2019-04-11 Thread Woods, Brian

On 4/8/19 6:19 AM, Jan Beulich wrote:
> If any IOMMUs were successfully initialized before encountering failure,
> the successfully enabled ones should be disabled again before cleaning
> up their resources.
> 
> Move disable_iommu() next to enable_iommu() to avoid a forward
> declaration, and take the opportunity to remove stray blank lines ahead
> of both functions' final closing braces.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -909,7 +909,35 @@ static void enable_iommu(struct amd_iomm
>   
>   iommu->enabled = 1;
>   spin_unlock_irqrestore(>lock, flags);
> +}
> +
> +static void disable_iommu(struct amd_iommu *iommu)
> +{
> +unsigned long flags;
> +
> +spin_lock_irqsave(>lock, flags);
> +
> +if ( !iommu->enabled )
> +{
> +spin_unlock_irqrestore(>lock, flags);
> +return;
> +}
> +
> +amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
> +set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
> +set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
> +
> +if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
> +set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
> +
> +if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
> +set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
> +
> +set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
>   
> +iommu->enabled = 0;
> +
> +spin_unlock_irqrestore(>lock, flags);
>   }
>   
>   static void __init deallocate_buffer(void *buf, uint32_t sz)
> @@ -1046,6 +1074,7 @@ static void __init amd_iommu_init_cleanu
>   list_del(>list);
>   if ( iommu->enabled )
>   {
> +disable_iommu(iommu);
>   deallocate_ring_buffer(>cmd_buffer);
>   deallocate_ring_buffer(>event_log);
>   deallocate_ring_buffer(>ppr_log);
> @@ -1297,36 +1326,6 @@ error_out:
>   return rc;
>   }
>   
> -static void disable_iommu(struct amd_iommu *iommu)
> -{
> -unsigned long flags;
> -
> -spin_lock_irqsave(>lock, flags);
> -
> -if ( !iommu->enabled )
> -{
> -spin_unlock_irqrestore(>lock, flags);
> -return;
> -}
> -
> -amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
> -set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
> -set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
> -
> -if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
> -set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
> -
> -if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
> -set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
> -
> -set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
> -
> -iommu->enabled = 0;
> -
> -spin_unlock_irqrestore(>lock, flags);
> -
> -}
> -
>   static void invalidate_all_domain_pages(void)
>   {
>   struct domain *d;
> 
> 
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] x86/svm: Fix handling of ICEBP intercepts

2019-04-10 Thread Woods, Brian

On 2/1/19 8:49 AM, Andrew Cooper wrote:
> c/s 9338a37d "x86/svm: implement debug events" added support for introspecting
> ICEBP debug exceptions, but didn't account for the fact that
> svm_get_insn_len() (previously __get_instruction_length) can fail and may
> already raise #GP for the guest.
> 
> If svm_get_insn_len() fails, return back to guest context rather than
> continuing and mistaking a trap-style VMExit for a fault-style one.
> 
> Spotted by Coverity.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> CC: Juergen Gross 
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> 
> This wants backporting to Xen 4.11
> ---
>   xen/arch/x86/hvm/svm/svm.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 2584b90..e21091c 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2758,6 +2758,9 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>   {
>   trap_type = X86_EVENTTYPE_PRI_SW_EXCEPTION;
>   inst_len = svm_get_insn_len(v, INSTR_ICEBP);
> +
> +if ( !instr_len )
> +break;
>   }
>   
>   rc = hvm_monitor_debug(regs->rip,
> 

Acked-by: Brian Woods 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/3] mwait support for AMD processors

2019-04-08 Thread Woods, Brian

On 3/28/19 10:04 AM, Woods, Brian wrote:
> This patch series add support and enablement for mwait on AMD Naples
> and Rome processors.  Newer AMD processors support mwait, but only for
> c1, and for c2 halt is used.  The mwait-idle driver is modified to be
> able to use both mwait and halt for idling.
> 
> Brian Woods (3):
>mwait-idle: add support for using halt
>mwait-idle: add support for AMD processors
>mwait-idle: add enablement for AMD Naples and Rome
> 
>   xen/arch/x86/acpi/cpu_idle.c  |  2 +-
>   xen/arch/x86/cpu/mwait-idle.c | 62 
> ++-
>   xen/include/asm-x86/cpuidle.h |  1 +
>   3 files changed, 57 insertions(+), 8 deletions(-)
> 

Just a ping to hopefully start the discussion that mentioned in the 
community call.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 7/7] x86/IOMMU: initialize iommu_ops in vendor-independent code

2019-04-05 Thread Woods, Brian

On 3/28/19 9:54 AM, Jan Beulich wrote:
> Move this into iommu_hardware_setup() and make that function non-
> inline. Move its declaration into common code.
> 
> Signed-off-by: Jan Beulich 
> 

Acked-by: Brian Woods 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/7] x86/IOMMU: introduce init-ops structure

2019-04-05 Thread Woods, Brian

On 3/28/19 9:51 AM, Jan Beulich wrote:
> Do away with the CPU vendor dependency, and set the init ops pointer
> based on what ACPI tables have been found.
> 
> Also take the opportunity and add __read_mostly to iommu_ops.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Brian Woods 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/7] x86/ACPI: also parse AMD IOMMU tables early

2019-04-05 Thread Woods, Brian

On 3/28/19 9:49 AM, Jan Beulich wrote:
> In order to be able to initialize x2APIC mode we need to parse
> respective ACPI tables early. Split amd_iov_detect() into two parts for
> this purpose, and call the initial part earlier on.
> 
> Signed-off-by: Jan Beulich 
> 
Acked-by: Brian Woods 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-03-28 Thread Woods, Brian

On 3/28/19 10:50 AM, Jan Beulich wrote:
 On 28.03.19 at 16:02,  wrote:
>> On 3/28/19 3:26 AM, Jan Beulich wrote:
>> On 27.03.19 at 18:28,  wrote:
 This also lacks some of the features of mwait-idle has and duplicates
 the limited functionality.
>>>
>>> Would you mind clarifying the lack-of-features aspect? The
>>> only difference to your patches that I can spot is that you set
>>> .target_residency in the static tables. If the value wanted
>>> for CC6 is really 1000 instead of the 800 the default
>>> calculation would produce, then this would be easy enough
>>> to arrange for in my variant of the patch as well.
>>>
>>> The mwait-idle driver would not have been needed at all if all
>>> BIOSes surfaced complete data via ACPI. Therefore, by
>>> suitably populating tables, it ought to be possible in theory to
>>> use just that one driver. It's just that for Intel CPUs we've
>>> decided to follow what Linux does, hence the separate
>>> driver. There's no Linux precedent for AMD (afaict).
>>
>> target_residency and some of the checks IIRC.
> 
> Could you be more specific what checks you mean?
> 
>> Yes, but that's Linux and this is Xen.  Linux has an AML interpreter and
>> Xen does not.  That's an apple to oranges comparison.  You can't compare
>> Xen to Linux for this because the features they have and how they work
>> are different.
> 
> It's not a direct comparison, sure. But lack of suitable ACPI data
> (known to happen in practice) would put Linux into exactly the
> same position. If Linux accepted changes to the driver to use
> entry methods other than MWAIT, I'd not be as opposed (but I'd
> still question their reasoning then).

Xen doesn't have an AML interpreter though and you can't reliably get 
the ACPI data from Dom0 in my experience.  You can't dictate what 
happens in Xen by what happens in Linux when the systems function 
completely different.  It's an apples and oranges situation.

>> Functionally, it should go in mwait-idle.
> 
> That's what I continue to question, seeing the HLT additions you're
> making. Plus older families (which you didn't cover at all so far)
> apparently would want HLT alone spelled out, which is even less
> suitable for a driver with this name.
> 
> Jan

Older families aren't compatible with Intel's mwait and we don't have 
any interest in enabling mwait on older families.  Older families won't 
be using this driver (mwait-idle) at all, but rather the acpi cpu_idle 
driver.  That's why there's only talk of F17h in the commits and code.

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/3] mwait-idle: add support for using halt

2019-03-28 Thread Woods, Brian

From: Brian Woods 

Some AMD processors can use a mixture of mwait and halt for accessing
various c-states.  In preparation for adding support for AMD processors,
update the mwait-idle driver to optionally use halt.

Signed-off-by: Brian Woods 
---
 xen/arch/x86/acpi/cpu_idle.c  |  2 +-
 xen/arch/x86/cpu/mwait-idle.c | 19 +--
 xen/include/asm-x86/cpuidle.h |  1 +
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 654de24f40..b45824d343 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -439,7 +439,7 @@ static void acpi_processor_ffh_cstate_enter(struct 
acpi_processor_cx *cx)
 mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
 }
 
-static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
+void acpi_idle_do_entry(struct acpi_processor_cx *cx)
 {
 struct cpu_info *info = get_cpu_info();
 
diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index f89c52f256..b9c7f75882 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -103,6 +103,11 @@ static const struct cpuidle_state {
 
 #define CPUIDLE_FLAG_DISABLED  0x1
 /*
+ * On certain AMD families that support mwait, only c1 can be reached by
+ * mwait and to reach c2, halt has to be used.
+ */
+#define CPUIDLE_FLAG_USE_HALT  0x2
+/*
  * Set this flag for states where the HW flushes the TLB for us
  * and so we don't need cross-calls to keep it consistent.
  * If this flag is set, SW flushes the TLB, so even if the
@@ -784,7 +789,7 @@ static void mwait_idle(void)
update_last_cx_stat(power, cx, before);
 
if (cpu_is_haltable(cpu))
-   mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
+   acpi_idle_do_entry(cx);
 
after = cpuidle_get_tick();
 
@@ -1184,8 +1189,9 @@ static int mwait_idle_cpu_init(struct notifier_block *nfb,
for (cstate = 0; cpuidle_state_table[cstate].target_residency; 
++cstate) {
unsigned int num_substates, hint, state;
struct acpi_processor_cx *cx;
+   const unsigned int cflags = cpuidle_state_table[cstate].flags;
 
-   hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
+   hint = flg2MWAIT(cflags);
state = MWAIT_HINT2CSTATE(hint) + 1;
 
if (state > max_cstate) {
@@ -1196,13 +1202,13 @@ static int mwait_idle_cpu_init(struct notifier_block 
*nfb,
/* Number of sub-states for this state in CPUID.MWAIT. */
num_substates = (mwait_substates >> (state * 4))
& MWAIT_SUBSTATE_MASK;
+
/* If NO sub-states for this state in CPUID, skip it. */
-   if (num_substates == 0)
+   if (num_substates == 0 && !(cflags & CPUIDLE_FLAG_USE_HALT))
continue;
 
/* if state marked as disabled, skip it */
-   if (cpuidle_state_table[cstate].flags &
-   CPUIDLE_FLAG_DISABLED) {
+   if (cflags & CPUIDLE_FLAG_DISABLED) {
printk(XENLOG_DEBUG PREFIX "state %s is disabled",
   cpuidle_state_table[cstate].name);
continue;
@@ -1221,7 +1227,8 @@ static int mwait_idle_cpu_init(struct notifier_block *nfb,
cx = dev->states + dev->count;
cx->type = state;
cx->address = hint;
-   cx->entry_method = ACPI_CSTATE_EM_FFH;
+   cx->entry_method = cflags & CPUIDLE_FLAG_USE_HALT ?
+  ACPI_CSTATE_EM_HALT : ACPI_CSTATE_EM_FFH;
cx->latency = cpuidle_state_table[cstate].exit_latency;
cx->target_residency =
cpuidle_state_table[cstate].target_residency;
diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h
index 08da01803f..33c8cf1593 100644
--- a/xen/include/asm-x86/cpuidle.h
+++ b/xen/include/asm-x86/cpuidle.h
@@ -18,6 +18,7 @@ extern uint64_t (*cpuidle_get_tick)(void);
 
 int mwait_idle_init(struct notifier_block *);
 int cpuidle_init_cpu(unsigned int cpu);
+void acpi_idle_do_entry(struct acpi_processor_cx *cx);
 void default_dead_idle(void);
 void acpi_dead_idle(void);
 void trace_exit_reason(u32 *irq_traced);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 3/3] mwait-idle: add enablement for AMD Naples and Rome

2019-03-28 Thread Woods, Brian

From: Brian Woods 

Add the needed data structures for enabling Naples (F17h M01h).  Since
Rome (F17h M31h) has the same c-state latencies and entry methods, the
c-state information can be used for Rome as well.  For both Naples and
Rome, mwait is used for c1 (cc1) and halt is functionally the same as
c2 (cc6).  If c2 (cc6) is disabled in BIOS, then halt functions similar
to c1 (cc1).

Signed-off-by: Brian Woods 
---
 xen/arch/x86/cpu/mwait-idle.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 58629f1c29..0d5d4caa4d 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -720,6 +720,22 @@ static const struct cpuidle_state dnv_cstates[] = {
{}
 };
 
+static const struct cpuidle_state naples_cstates[] = {
+   {
+   .name = "CC1",
+   .flags = MWAIT2flg(0x00),
+   .exit_latency = 1,
+   .target_residency = 2,
+   },
+   {
+   .name = "CC6",
+   .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_USE_HALT,
+   .exit_latency = 400,
+   .target_residency = 1000,
+   },
+   {}
+};
+
 static void mwait_idle(void)
 {
unsigned int cpu = smp_processor_id();
@@ -964,10 +980,16 @@ static const struct x86_cpu_id intel_idle_ids[] 
__initconstrel = {
{}
 };
 
+static const struct idle_cpu idle_cpu_naples = {
+   .state_table = naples_cstates,
+};
+
 #define ACPU(family, model, cpu) \
{ X86_VENDOR_AMD, family, model, X86_FEATURE_ALWAYS, _cpu_##cpu}
 
 static const struct x86_cpu_id amd_idle_ids[] __initconstrel = {
+   ACPU(0x17, 0x01, naples),
+   ACPU(0x17, 0x31, naples), /* Rome shares the same c-state config */
{}
 };
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/3] mwait-idle: add support for AMD processors

2019-03-28 Thread Woods, Brian

From: Brian Woods 

Newer AMD processors (F17h) have mwait support which is compatible with
Intel.  Add some checks to make sure vendor specific code is run
correctly and some infrastructure to facilitate adding AMD processors.

This is done so that Xen will not be reliant on dom0 passing the parsed
ACPI tables back since Xen doesn't have an AML interpreter. This can be
unreliable or broken in some cases.

Signed-off-by: Brian Woods 
---
 xen/arch/x86/cpu/mwait-idle.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index b9c7f75882..58629f1c29 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -964,6 +964,13 @@ static const struct x86_cpu_id intel_idle_ids[] 
__initconstrel = {
{}
 };
 
+#define ACPU(family, model, cpu) \
+   { X86_VENDOR_AMD, family, model, X86_FEATURE_ALWAYS, _cpu_##cpu}
+
+static const struct x86_cpu_id amd_idle_ids[] __initconstrel = {
+   {}
+};
+
 /*
  * ivt_idle_state_table_update(void)
  *
@@ -1100,6 +1107,9 @@ static void __init sklh_idle_state_table_update(void)
  */
 static void __init mwait_idle_state_table_update(void)
 {
+   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+   return;
+
switch (boot_cpu_data.x86_model) {
case 0x3e: /* IVT */
ivt_idle_state_table_update();
@@ -1117,7 +1127,16 @@ static void __init mwait_idle_state_table_update(void)
 static int __init mwait_idle_probe(void)
 {
unsigned int eax, ebx, ecx;
-   const struct x86_cpu_id *id = x86_match_cpu(intel_idle_ids);
+   const struct x86_cpu_id *id = NULL;
+
+   switch (boot_cpu_data.x86_vendor) {
+   case X86_VENDOR_INTEL:
+   id = x86_match_cpu(intel_idle_ids);
+   break;
+   case X86_VENDOR_AMD:
+   id = x86_match_cpu(amd_idle_ids);
+   break;
+   }
 
if (!id) {
pr_debug(PREFIX "does not run on family %d model %d\n",
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/3] mwait support for AMD processors

2019-03-28 Thread Woods, Brian

This patch series add support and enablement for mwait on AMD Naples
and Rome processors.  Newer AMD processors support mwait, but only for
c1, and for c2 halt is used.  The mwait-idle driver is modified to be
able to use both mwait and halt for idling.

Brian Woods (3):
  mwait-idle: add support for using halt
  mwait-idle: add support for AMD processors
  mwait-idle: add enablement for AMD Naples and Rome

 xen/arch/x86/acpi/cpu_idle.c  |  2 +-
 xen/arch/x86/cpu/mwait-idle.c | 62 ++-
 xen/include/asm-x86/cpuidle.h |  1 +
 3 files changed, 57 insertions(+), 8 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-03-28 Thread Woods, Brian

On 3/28/19 3:26 AM, Jan Beulich wrote:
 On 27.03.19 at 18:28,  wrote:
>> This also lacks some of the features of mwait-idle has and duplicates
>> the limited functionality.
> 
> Would you mind clarifying the lack-of-features aspect? The
> only difference to your patches that I can spot is that you set
> .target_residency in the static tables. If the value wanted
> for CC6 is really 1000 instead of the 800 the default
> calculation would produce, then this would be easy enough
> to arrange for in my variant of the patch as well.
> 
> The mwait-idle driver would not have been needed at all if all
> BIOSes surfaced complete data via ACPI. Therefore, by
> suitably populating tables, it ought to be possible in theory to
> use just that one driver. It's just that for Intel CPUs we've
> decided to follow what Linux does, hence the separate
> driver. There's no Linux precedent for AMD (afaict).

target_residency and some of the checks IIRC.

Yes, but that's Linux and this is Xen.  Linux has an AML interpreter and 
Xen does not.  That's an apple to oranges comparison.  You can't compare 
Xen to Linux for this because the features they have and how they work 
are different.

>>   There's also a lack of comments which may or
>> may not be needed.  So that would add to the line change count if you
>> care about that.
>>
>> I'm not sure why you're so adverse to the mwait-idle patches.  We're
>> hard coding values in and using mwait (just like Intel is), but the only
>> real change we need is using halt for one c-state.
> 
> But that's precisely what I dislike, as getting us further away
> from the Linux driver. And btw, if we were to go that route,
> then I think we'd better call acpi_idle_do_entry() than to
> duplicate further parts of it. But that would also remove some
> of that other part of the benefits of mwait_idle() over
> acpi_processor_idle(): The former is much more streamlined,
> due to not having to care about anything other than MWAIT.
> 
> As an aside, despite having followed the HLT approach in my
> draft patch, I continue to be unconvinced that this is what we
> actually want. There's a respective TBD remark there.
> 
> Jan
> 
> 

The changes needed are small though... most of the changes are 
non-intrusive.  It's just a couple of lines here and then and then 
something where it calls what entry_method.  Although, I think using 
acpi_idle_do_entry() is perfectly fine. With the acpi_idle_do_entry 
change, the line change count of the mwait-idle patches is 65 lines. A 
lot of that is the structures (28 lines), which isn't any _real_ code 
change.

One function call and one switch statement isn't going to change the 
performance or how streamlined it is.  The only other change is when 
it's initialized which is only at start up.

Functionally, it should go in mwait-idle.  The changes are small, the 
functionally the same, there's no duplication of functionality or code.

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-03-27 Thread Woods, Brian

On 3/27/19 9:48 AM, Jan Beulich wrote:
 On 26.03.19 at 22:56,  wrote:
>> On 3/26/19 10:54 AM, Jan Beulich wrote:
>> On 19.03.19 at 17:12,  wrote:
 On 3/15/19 3:37 AM, Jan Beulich wrote:
> Furthermore I'm then once again wondering what the gain is
> over using the ACPI driver: The suggested _CST looks to exactly
> match the data you enter into the table in the later patch. IOW
> my fundamental concern didn't go away yet: As per the name
> of the driver, it shouldn't really need to support HLT (or anything
> other than MWAIT) as an entry method. Hence I think that at
> the very least you need to extend the description of the change
> quite a bit to explain why the ACPI driver is not suitable.
>
> Depending on how this comes out, it may then still be a matter
> of discussing whether, rather than fiddling with mwait-idle, it
> wouldn't be better to have an AMD-specific driver instead. Are
> there any thoughts in similar directions for Linux?

 Because:
 #1 getting the ACPI tables from dom0 is either unreliable (PV dom0) or
 not possible (PVH dom0).
 #2 the changes to the Intel code are minimal.
 #3 worse case, Xen thinks it's using CC6 when it's using CC1.  Not
 perfect but far from fatal or breaking.
>>>
>>> Having thought about this some more, I agree that an AMD-specific
>>> driver would likely go too far. However, that's still no reason to fiddle
>>> with the mwait-idle one - I think you could as well populate the data
>>> as necessary for the ACPI driver to use, removing the dependency
>>> on Dom0. After all that driver already knows of all the entry methods
>>> you may want/need to use (see acpi_idle_do_entry()).
>>>
>> I did a rough example of how that might work and lines of code changed
>> for adding it to cpu_idle was roughly 125.  Seeing as this doesn't
>> compile and doesn't even have comments, I'd say at least 140 lines of
>> code/change (most of those are additive too), a lot of is functionally
>> copied from mwait-idle and how it reads data out of the structures,
>> checks, and populates the cx structures.  The first set of mwait patches
>> is 87 lines changed total.
>>
>> I _could_ try and refactor some of the code and get it down from
>> 125-140, but that would most likely make porting changes even harder for
>> mwait-idle.
> 
> Well, I was rather thinking about something like the change below,
> taking slightly over 100 lines of new code, and not touching
> mwait-idle.c at all. Otoh there are a couple of TBDs in there which
> may cause the patch to further grow once addressed.
> 
> Note that this goes on top of
> https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg00089.html
> which sadly there still wasn't sufficient feedback on to decide where
> to go with the series; all I know is that Andrew (understandably)
> doesn't want to see the last patch go in without vendor confirmation
> (and I'd be fine to drop that last patch if need be, but this shouldn't
> block the earlier patches in the series).
> 
> Jan
> 
> x86/AMD: make C-state handling independent of Dom0
> 
> At least for more recent CPUs, following what BKDG / PPR suggest for the
> BIOS to surface via ACPI we can make ourselves independent of Dom0
> uploading respective data.
> 
> Signed-off-by: Jan Beulich 
> ---
> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
>   statement in the BKDG / PPR as to whether the LAPIC timer continues
>   running in CC6.
> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> TBD: I guess we could extend this to families older then Fam15 as well.
> 
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -120,6 +120,8 @@ boolean_param("lapic_timer_c2_ok", local
>   
>   struct acpi_processor_power *__read_mostly processor_powers[NR_CPUS];
>   
> +static int8_t __read_mostly vendor_override;
> +
>   struct hw_residencies
>   {
>   uint64_t mc0;
> @@ -1220,6 +1222,9 @@ long set_cx_pminfo(uint32_t acpi_id, str
>   if ( pm_idle_save && pm_idle != acpi_processor_idle )
>   return 0;
>   
> +if ( vendor_override > 0 )
> +return 0;
> +
>   print_cx_pminfo(acpi_id, power);
>   
>   cpu_id = get_cpu_id(acpi_id);
> @@ -1292,6 +1297,98 @@ long set_cx_pminfo(uint32_t acpi_id, str
>   return 0;
>   }
>   
> +static void amd_cpuidle_init(struct acpi_processor_power *power)
> +{
> +unsigned int i, nr = 0;
> +const struct cpuinfo_x86 *c = _cpu_data;
> +const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
> + CPUID5_ECX_INTERRUPT_BREAK;
> +const struct acpi_processor_cx *cx = NULL;
> +static const struct acpi_processor_cx fam17[] = {
> +{
> +.type = ACPI_STATE_C1,
> +.entry_method = ACPI_CSTATE_EM_FFH,
> +.address = 0,
> +.latency = 1,
> +},
> +{
> +.type =

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-03-26 Thread Woods, Brian

On 3/26/19 10:54 AM, Jan Beulich wrote:
 On 19.03.19 at 17:12,  wrote:
>> On 3/15/19 3:37 AM, Jan Beulich wrote:
>>> Furthermore I'm then once again wondering what the gain is
>>> over using the ACPI driver: The suggested _CST looks to exactly
>>> match the data you enter into the table in the later patch. IOW
>>> my fundamental concern didn't go away yet: As per the name
>>> of the driver, it shouldn't really need to support HLT (or anything
>>> other than MWAIT) as an entry method. Hence I think that at
>>> the very least you need to extend the description of the change
>>> quite a bit to explain why the ACPI driver is not suitable.
>>>
>>> Depending on how this comes out, it may then still be a matter
>>> of discussing whether, rather than fiddling with mwait-idle, it
>>> wouldn't be better to have an AMD-specific driver instead. Are
>>> there any thoughts in similar directions for Linux?
>>
>> Because:
>> #1 getting the ACPI tables from dom0 is either unreliable (PV dom0) or
>> not possible (PVH dom0).
>> #2 the changes to the Intel code are minimal.
>> #3 worse case, Xen thinks it's using CC6 when it's using CC1.  Not
>> perfect but far from fatal or breaking.
> 
> Having thought about this some more, I agree that an AMD-specific
> driver would likely go too far. However, that's still no reason to fiddle
> with the mwait-idle one - I think you could as well populate the data
> as necessary for the ACPI driver to use, removing the dependency
> on Dom0. After all that driver already knows of all the entry methods
> you may want/need to use (see acpi_idle_do_entry()).
> 
> Jan
> 
> 
I did a rough example of how that might work and lines of code changed 
for adding it to cpu_idle was roughly 125.  Seeing as this doesn't 
compile and doesn't even have comments, I'd say at least 140 lines of 
code/change (most of those are additive too), a lot of is functionally 
copied from mwait-idle and how it reads data out of the structures, 
checks, and populates the cx structures.  The first set of mwait patches 
is 87 lines changed total.

I _could_ try and refactor some of the code and get it down from 
125-140, but that would most likely make porting changes even harder for 
mwait-idle.

What are your thoughts?

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 00/12] improve late microcode loading

2019-03-19 Thread Woods, Brian

On 3/19/19 3:22 PM, Brian Woods wrote:
> On 3/11/19 2:57 AM, Chao Gao wrote:
>> Major changes in version 6:
>>   - run wbinvd before updating microcode (patch 10)
>>   - add an userspace tool for late microcode update (patch 1)
>>   - scale time to wait by the number of remaining CPUs to respond
>>   - remove 'cpu' parameters from some related callbacks and functins
>>   - save an ucode patch only if its supported CPU is allowed to mix with
>>     current cpu.
>>
>> Changes in version 5:
>>   - support parallel microcode updates for all cores (see patch 8)
>>   - Address Roger's comments on the last version.
>>
>> The intention of this series is to make the late microcode loading
>> more reliable by rendezvousing all cpus in stop_machine context.
>> This idea comes from Ashok. I am porting his linux patch to Xen
>> (see patch 10 and 11 for more details).
>>
>> This series makes five changes:
>>   1. Patch 1: an userspace tool for late microcode update
>>   2. Patch 2-9: introduce a global microcode cache and some cleanup
>>   3. Patch 10: writeback and invalidate cache before updating microcode
>>   3. Patch 11: synchronize late microcode loading
>>   4. Patch 12: support parallel microcodes update on different cores
>>
>> Currently, late microcode loading does a lot of things including
>> parsing microcode blob, checking the signature/revision and performing
>> update. Putting all of them into stop_machine context is a bad idea
>> because of complexity (One issue I observed is memory allocation
>> triggered one assertion in stop_machine context). In order to simplify
>> the load process, I move parsing microcode out of the load process.
>> The microcode blob is parsed and a global microcode cache is built on
>> a single CPU before rendezvousing all cpus to update microcode. Other
>> CPUs just get and load a suitable microcode from the global cache.
>> With this global cache, it is safe to put simplified load process to
>> stop_machine context.
>>
>> Regarding changes to AMD side, I didn't do any test for them due to
>> lack of hardware. Could you help to test this series on an AMD machine?
>> At least, two basic tests are needed:
>> * do a microcode update after system bootup
>> * don't bring all pCPUs up at bootup by specifying maxcpus option in xen
>>    command line and then do a microcode update and online all offlined
>>    CPUs via 'xen-hptool'.
>>
>> Chao Gao (12):
>>    misc/xenmicrocode: Upload a microcode blob to the hypervisor
>>    microcode/intel: use union to get fields without shifting and masking
>>    microcode/intel: extend microcode_update_match()
>>    microcode: introduce a global cache of ucode patch
>>    microcode: only save compatible ucode patches
>>    microcode: remove struct ucode_cpu_info
>>    microcode: remove pointless 'cpu' parameter
>>    microcode: split out apply_microcode() from cpu_request_microcode()
>>    microcode: remove struct microcode_info
>>    microcode/intel: Writeback and invalidate caches before updating
>>  microcode
>>    x86/microcode: Synchronize late microcode loading
>>    microcode: update microcode on cores in parallel
>>
>>   tools/libxc/include/xenctrl.h   |   1 +
>>   tools/libxc/xc_misc.c   |  20 +++
>>   tools/misc/Makefile |   4 +
>>   tools/misc/xenmicrocode.c   |  89 ++
>>   xen/arch/x86/acpi/power.c   |   2 +-
>>   xen/arch/x86/apic.c |   2 +-
>>   xen/arch/x86/microcode.c    | 380 
>> +++-
>>   xen/arch/x86/microcode_amd.c    | 236 -
>>   xen/arch/x86/microcode_intel.c  | 206 +-
>>   xen/arch/x86/smpboot.c  |   5 +-
>>   xen/arch/x86/spec_ctrl.c    |   2 +-
>>   xen/include/asm-x86/microcode.h |  40 +++--
>>   xen/include/asm-x86/processor.h |   3 +-
>>   13 files changed, 639 insertions(+), 351 deletions(-)
>>   create mode 100644 tools/misc/xenmicrocode.c
>>
> 
> Sorry for the delay.  These patches fail on F17h.  I'm looking into 
> where it fails now.

Bisecting it says it's commit "microcode: introduce a global cache of 
ucode patch."

The failing commit fails with:
(XEN) [0085227df312] microcode: CPU0 update from revision 0x8001207 
to 0x8304 failed
(XEN) [0085240578ec] traps.c:1574: GPF (): 82d080426c88 
[probe_cpuid_faulting+0xe/0xa2] -> 82d0803818b2

That microcode revision is WAY off.  It should be 0x8001227 and not 
0x8304.  I don't think I'll be able to do much on it before the end 
of today, but let me what information you need or if there's anything I 
should be looking at in particular.

Thanks,
Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 00/12] improve late microcode loading

2019-03-19 Thread Woods, Brian

On 3/11/19 2:57 AM, Chao Gao wrote:
> Major changes in version 6:
>   - run wbinvd before updating microcode (patch 10)
>   - add an userspace tool for late microcode update (patch 1)
>   - scale time to wait by the number of remaining CPUs to respond
>   - remove 'cpu' parameters from some related callbacks and functins
>   - save an ucode patch only if its supported CPU is allowed to mix with
> current cpu.
> 
> Changes in version 5:
>   - support parallel microcode updates for all cores (see patch 8)
>   - Address Roger's comments on the last version.
> 
> The intention of this series is to make the late microcode loading
> more reliable by rendezvousing all cpus in stop_machine context.
> This idea comes from Ashok. I am porting his linux patch to Xen
> (see patch 10 and 11 for more details).
> 
> This series makes five changes:
>   1. Patch 1: an userspace tool for late microcode update
>   2. Patch 2-9: introduce a global microcode cache and some cleanup
>   3. Patch 10: writeback and invalidate cache before updating microcode
>   3. Patch 11: synchronize late microcode loading
>   4. Patch 12: support parallel microcodes update on different cores
> 
> Currently, late microcode loading does a lot of things including
> parsing microcode blob, checking the signature/revision and performing
> update. Putting all of them into stop_machine context is a bad idea
> because of complexity (One issue I observed is memory allocation
> triggered one assertion in stop_machine context). In order to simplify
> the load process, I move parsing microcode out of the load process.
> The microcode blob is parsed and a global microcode cache is built on
> a single CPU before rendezvousing all cpus to update microcode. Other
> CPUs just get and load a suitable microcode from the global cache.
> With this global cache, it is safe to put simplified load process to
> stop_machine context.
> 
> Regarding changes to AMD side, I didn't do any test for them due to
> lack of hardware. Could you help to test this series on an AMD machine?
> At least, two basic tests are needed:
> * do a microcode update after system bootup
> * don't bring all pCPUs up at bootup by specifying maxcpus option in xen
>command line and then do a microcode update and online all offlined
>CPUs via 'xen-hptool'.
> 
> Chao Gao (12):
>misc/xenmicrocode: Upload a microcode blob to the hypervisor
>microcode/intel: use union to get fields without shifting and masking
>microcode/intel: extend microcode_update_match()
>microcode: introduce a global cache of ucode patch
>microcode: only save compatible ucode patches
>microcode: remove struct ucode_cpu_info
>microcode: remove pointless 'cpu' parameter
>microcode: split out apply_microcode() from cpu_request_microcode()
>microcode: remove struct microcode_info
>microcode/intel: Writeback and invalidate caches before updating
>  microcode
>x86/microcode: Synchronize late microcode loading
>microcode: update microcode on cores in parallel
> 
>   tools/libxc/include/xenctrl.h   |   1 +
>   tools/libxc/xc_misc.c   |  20 +++
>   tools/misc/Makefile |   4 +
>   tools/misc/xenmicrocode.c   |  89 ++
>   xen/arch/x86/acpi/power.c   |   2 +-
>   xen/arch/x86/apic.c |   2 +-
>   xen/arch/x86/microcode.c| 380 
> +++-
>   xen/arch/x86/microcode_amd.c| 236 -
>   xen/arch/x86/microcode_intel.c  | 206 +-
>   xen/arch/x86/smpboot.c  |   5 +-
>   xen/arch/x86/spec_ctrl.c|   2 +-
>   xen/include/asm-x86/microcode.h |  40 +++--
>   xen/include/asm-x86/processor.h |   3 +-
>   13 files changed, 639 insertions(+), 351 deletions(-)
>   create mode 100644 tools/misc/xenmicrocode.c
> 

Sorry for the delay.  These patches fail on F17h.  I'm looking into 
where it fails now.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-03-19 Thread Woods, Brian

On 3/15/19 3:37 AM, Jan Beulich wrote:
 On 14.03.19 at 20:00,  wrote:
>> On 3/13/19 4:35 AM, Jan Beulich wrote:
>> On 25.02.19 at 21:23,  wrote:
 --- a/xen/arch/x86/cpu/mwait-idle.c
 +++ b/xen/arch/x86/cpu/mwait-idle.c
 @@ -103,6 +103,11 @@ static const struct cpuidle_state {

#define CPUIDLE_FLAG_DISABLED   0x1
/*
 + * On certain AMD families that support mwait, only c1 can be reached by
 + * mwait and to reach c2, halt has to be used.
 + */
 +#define CPUIDLE_FLAG_USE_HALT 0x2
>>>
>>> Could you point us at where in the manuals this behavior is described?
>>> While PM Vol 2 has a chapter talking about P-states, I can't seem to
>>> find any mention of C-states there.
>>
>> IIRC it's in the NDA PPR and internally it's in some other documents.
>> We don't have support to use mwait while in CC6 due to caches being
>> turned off etc.  If we did have mwait suport for CC6, we'd use that here
>> (basically mirroring Intel).  Sadly I don't think we have any public
>> information directly detailing this information.  If you'd like, I can
>> look further into it.
> 
> Ah yes, I found it. But the text suggests to use SystemIO, not
> HLT for entering C2 (CC6). An important difference looks to be
> the state of EFLAGS.IF as to whether the core wakes up again.
> The SystemIO approach would better match the FFixedHW one,
> as we require and use MWAIT_ECX_INTERRUPT_BREAK.
> 
> Furthermore I'm then once again wondering what the gain is
> over using the ACPI driver: The suggested _CST looks to exactly
> match the data you enter into the table in the later patch. IOW
> my fundamental concern didn't go away yet: As per the name
> of the driver, it shouldn't really need to support HLT (or anything
> other than MWAIT) as an entry method. Hence I think that at
> the very least you need to extend the description of the change
> quite a bit to explain why the ACPI driver is not suitable.
> 
> Depending on how this comes out, it may then still be a matter
> of discussing whether, rather than fiddling with mwait-idle, it
> wouldn't be better to have an AMD-specific driver instead. Are
> there any thoughts in similar directions for Linux?

I can make it use sysIO rather than HLT if there's a need or strong 
desire for it.  I used HLT mainly because I thought it would be more 
robust (like in the case of CC6 being disabled).

Because:
#1 getting the ACPI tables from dom0 is either unreliable (PV dom0) or 
not possible (PVH dom0).
#2 the changes to the Intel code are minimal.
#3 worse case, Xen thinks it's using CC6 when it's using CC1.  Not 
perfect but far from fatal or breaking.

In Linux, they have a working AML interrupter so they just read the ACPI 
tables.  If Xen had a working AML interrupter, I'd suggest just reading 
the ACPI tables as well.  As far as a completely different driver for 
AMD, it would mostly just be the Intel drive with the small changes and 
some code removed.  With the minimal changes needed, I don't see a 
reason, but that's just me.

 +  case ACPI_CSTATE_EM_HALT:
 +  info = get_cpu_info();
 +  spec_ctrl_enter_idle(info);
 +  safe_halt();
 +  spec_ctrl_exit_idle(info);
>>>
>>> ... wouldn't it be better to avoid the redundancy with default_idle(),
>>> by introducing a new helper function, e.g. spec_ctrl_safe_halt()?
>>>
>> See my email with Wei about this.
> 
> There you've basically settled on making a helper function, to
> be used in pre-existing places as well as here.
> 
> I've also just noticed that there's another safe_halt() invocation
> a few lines up from here, as a fallback. It doesn't come with any
> of the statistics though, so would probably be unsuitable to
> funnel into.

It does use follow the pattern of:
spec_ctrl_enter_idle(info);
safe_halt();
spec_ctrl_exit_idle(info);
though.  I'm pretty sure out would work with what I suggested or am I 
missing something?

 @@ -1221,7 +1242,12 @@ static int mwait_idle_cpu_init(struct 
 notifier_block *nfb,
cx = dev->states + dev->count;
cx->type = state;
cx->address = hint;
 -  cx->entry_method = ACPI_CSTATE_EM_FFH;
 +
 +  if (flags & CPUIDLE_FLAG_USE_HALT)
 +  cx->entry_method = ACPI_CSTATE_EM_HALT;
 +   else
 +  cx->entry_method = ACPI_CSTATE_EM_FFH;
>>>
>>> I'd prefer if you used a conditional expression here. One of the goals for
>>> any changes to this file should be to limit the delta to its Linux 
>>> original, in
>>> order to increase the chances of patches coming from there to apply
>>> reasonably cleanly here.
>>>
>>> Doing so would also save me from complaining about the stray blank
>>> ahead of "else".
>>
>> By conditional statement you mean ternary?  If so, that'll be easy enough.
> 
> Yes.
>

Re: [Xen-devel] [PATCH 3/3] mwait-idle: add enablement for AMD Naples and Rome

2019-03-19 Thread Woods, Brian

On 3/15/19 3:54 AM, Jan Beulich wrote:
 On 14.03.19 at 20:29,  wrote:
>> On 3/13/19 4:51 AM, Jan Beulich wrote:
>> On 25.02.19 at 21:24,  wrote:
 Add the needed data structures for enabling Naples (F17h M01h).  Since
 Rome (F17h M31h) has the same c-state latencies and entry methods, the
 c-state information can be used for Rome as well.  For both Naples and
 Rome, mwait is used for c1 (cc1) and halt is functionally the same as
 c2 (cc6).  If c2 (cc6) is disabled in BIOS, then halt functions similar
 to c1 (cc1).
>>>
>>> But your code does not detect this situation, and does hence not update
>>> the table used accordingly. Why is this? Is entering C1 cheaper one way
>>> or the other in this situation (in which case the cheaper approach should
>>> always be used)?
>>
>> Well, if Xen had an AML interrupter, we could use the ACPI tables like
>> we do in Linux, but Xen doesn't (which is why we're hard coding it).
> 
> But the necessary data gets uploaded by the ACPI code in Dom0. Or
> else there wouldn't be a point to have an ACPI idle driver in Xen in the
> first place.
> 
> We should add custom (vendor specific) code to Xen only if there
> are clear advantages over the ACPI based approach, and so far
> the patch descriptions don't make clear what advantages there
> are (besides becoming independent of Dom0, which I'd consider
> marginal).
> 

I'll update the patches to explain why this is needed (aka, unreliable 
(PV dom0) or no passing (PVH dom0) of the ACPI tables back to Xen.

>> mwait has the CPUID_Fn0005_EDX MSR but since we don't have a mwait
>> support for CC6, we can't use that.  There's another register we _might_
>> be able to use, but support for CC6 is AND'd with that and another
>> another register (we don't have access to). The register we'd read is
>> also RW.  So I'm not sure I trust it.
> 
> It's hard to believe that one can't find out whether HLT would enter
> only CC1 or eventually also CC6.
> 
> Jan
> 

There's a register, but it's AND'd with firmware for if C6 is enabled. 
Assuming it isn't touched, it should be able to determine if C6 is 
enabled or not by BIOS.  That leads to more code and the negative of not 
checking is system thinking it's using CC6 when it's really using CC1. 
It's also NDA'd so I'd have to get approval to use it (and then also put 
it in the public PPR).

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] mwait-idle: add enablement for AMD Naples and Rome

2019-03-14 Thread Woods, Brian

On 3/13/19 4:51 AM, Jan Beulich wrote:
 On 25.02.19 at 21:24,  wrote:
>> Add the needed data structures for enabling Naples (F17h M01h).  Since
>> Rome (F17h M31h) has the same c-state latencies and entry methods, the
>> c-state information can be used for Rome as well.  For both Naples and
>> Rome, mwait is used for c1 (cc1) and halt is functionally the same as
>> c2 (cc6).  If c2 (cc6) is disabled in BIOS, then halt functions similar
>> to c1 (cc1).
> 
> But your code does not detect this situation, and does hence not update
> the table used accordingly. Why is this? Is entering C1 cheaper one way
> or the other in this situation (in which case the cheaper approach should
> always be used)?
> 
> Jan
> 
> 

Well, if Xen had an AML interrupter, we could use the ACPI tables like 
we do in Linux, but Xen doesn't (which is why we're hard coding it). 
mwait has the CPUID_Fn0005_EDX MSR but since we don't have a mwait 
support for CC6, we can't use that.  There's another register we _might_ 
be able to use, but support for CC6 is AND'd with that and another 
another register (we don't have access to). The register we'd read is 
also RW.  So I'm not sure I trust it.

The worst case (no CC6 when we think we have it), is that halt uses cc1 
rather than cc6.  I don't see a negative downside to this other than 
delay.  Although, if the idle scheduler is expecting a longer delay that 
what it is, I don't see this as a huge issue.  It still can use mwait 
for smaller idle periods.  Although, if all c states are turned off, 
mwait should even be available and this code is never enabled (and the 
default halt is used).

I agree it isn't optimal, but it's the best solution I can think of.

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] mwait-idle: add support for AMD processors

2019-03-14 Thread Woods, Brian

On 3/13/19 4:42 AM, Jan Beulich wrote:
 On 25.02.19 at 21:23,  wrote:
>> Newer AMD processors (F17h) have mwait support.  Add some checks to make
>> sure vendor specific code is run correctly and some infrastructure to
>> facilitate adding AMD processors.
> 
> Both my Fam15 and my Fam10 system have CPUID[1].ECX[3] set - why
> the reference to Fam17 here?

We added CPUID_Fn0005_EDX to match Intel starting with F17h M01h 
(Naples).  Therefore going forward, we're just enabling with Naples and 
further processors.

Noted about other comments.

Brian

>> @@ -1115,6 +1122,9 @@ static void __init sklh_idle_state_table_update(void)
>>*/
>>   static void __init mwait_idle_state_table_update(void)
>>   {
>> +if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>> +return;
> 
> Please use != INTEL here.
> 
>> @@ -1126,13 +1136,24 @@ static void __init 
>> mwait_idle_state_table_update(void)
>>  case 0x5e: /* SKL-H */
>>  sklh_idle_state_table_update();
>>  break;
>> -}
>> +}
>>   }
>>   
>>   static int __init mwait_idle_probe(void)
>>   {
>>  unsigned int eax, ebx, ecx;
>> -const struct x86_cpu_id *id = x86_match_cpu(intel_idle_ids);
>> +const struct x86_cpu_id *id;
>> +
>> +switch (boot_cpu_data.x86_vendor) {
>> +case X86_VENDOR_INTEL:
>> +id = x86_match_cpu(intel_idle_ids);
>> +break;
>> +case X86_VENDOR_AMD:
>> +id = x86_match_cpu(amd_idle_ids);
>> +break;
>> +default:
>> +id = NULL;
>> +}
> 
> Missing break statement again, but perhaps even better here to drop
> the default: and make NULL the variable's initializer.
> 
> Jan
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-03-14 Thread Woods, Brian

On 3/13/19 4:35 AM, Jan Beulich wrote:
 On 25.02.19 at 21:23,  wrote:
>> --- a/xen/arch/x86/cpu/mwait-idle.c
>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>> @@ -103,6 +103,11 @@ static const struct cpuidle_state {
>>   
>>   #define CPUIDLE_FLAG_DISABLED  0x1
>>   /*
>> + * On certain AMD families that support mwait, only c1 can be reached by
>> + * mwait and to reach c2, halt has to be used.
>> + */
>> +#define CPUIDLE_FLAG_USE_HALT   0x2
> 
> Could you point us at where in the manuals this behavior is described?
> While PM Vol 2 has a chapter talking about P-states, I can't seem to
> find any mention of C-states there.

IIRC it's in the NDA PPR and internally it's in some other documents. 
We don't have support to use mwait while in CC6 due to caches being 
turned off etc.  If we did have mwait suport for CC6, we'd use that here 
(basically mirroring Intel).  Sadly I don't think we have any public 
information directly detailing this information.  If you'd like, I can 
look further into it.

>> @@ -783,8 +788,23 @@ static void mwait_idle(void)
>>   
>>  update_last_cx_stat(power, cx, before);
>>   
>> -if (cpu_is_haltable(cpu))
>> -mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
>> +if (cpu_is_haltable(cpu)) {
>> +struct cpu_info *info;
>> +switch (cx->entry_method) {
> 
> Blank line between declaration(s) and statement(s) please. And
> it would seem better to move the declaration right here (inside
> the switch()) anyway. Then again ...
> 
>> +case ACPI_CSTATE_EM_FFH:
>> +mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
>> +break;
>> +case ACPI_CSTATE_EM_HALT:
>> +info = get_cpu_info();
>> +spec_ctrl_enter_idle(info);
>> +safe_halt();
>> +spec_ctrl_exit_idle(info);
> 
> ... wouldn't it be better to avoid the redundancy with default_idle(),
> by introducing a new helper function, e.g. spec_ctrl_safe_halt()?
> 
See my email with Wei about this.


>> +local_irq_disable();
>> +break;
>> +default:
>> +printk(XENLOG_ERR PREFIX "unknown entry method %d\n", 
>> cx->entry_method);
>> +}
> 
> Overly long line and missing break statement.
> 
>> @@ -1184,8 +1204,9 @@ static int mwait_idle_cpu_init(struct notifier_block 
>> *nfb,
>>  for (cstate = 0; cpuidle_state_table[cstate].target_residency; 
>> ++cstate) {
>>  unsigned int num_substates, hint, state;
>>  struct acpi_processor_cx *cx;
>> +const unsigned int flags = cpuidle_state_table[cstate].flags;
> 
> May I suggest to name the variable slightly differently, e.g. cflags,
> to avoid any risk of it being mistaken for what we commonly use
> with e.g. spin_lock_irqsave()?
> 
>> @@ -1221,7 +1242,12 @@ static int mwait_idle_cpu_init(struct notifier_block 
>> *nfb,
>>  cx = dev->states + dev->count;
>>  cx->type = state;
>>  cx->address = hint;
>> -cx->entry_method = ACPI_CSTATE_EM_FFH;
>> +
>> +if (flags & CPUIDLE_FLAG_USE_HALT)
>> +cx->entry_method = ACPI_CSTATE_EM_HALT;
>> + else
>> +cx->entry_method = ACPI_CSTATE_EM_FFH;
> 
> I'd prefer if you used a conditional expression here. One of the goals for
> any changes to this file should be to limit the delta to its Linux original, 
> in
> order to increase the chances of patches coming from there to apply
> reasonably cleanly here.
> 
> Doing so would also save me from complaining about the stray blank
> ahead of "else".
> 
> Jan
> 

By conditional statement you mean ternary?  If so, that'll be easy enough.

Also, noted for things I didn't directly to.

Brian

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-03-11 Thread Woods, Brian

On 3/5/19 11:12 AM, Wei Liu wrote:
> On Wed, Feb 27, 2019 at 06:23:35PM +0000, Woods, Brian wrote:
>> On 2/27/19 7:47 AM, Wei Liu wrote:
>>> On Mon, Feb 25, 2019 at 08:23:58PM +, Woods, Brian wrote:
>>>> Some AMD processors can use a mixture of mwait and halt for accessing
>>>> various c-states.  In preparation for adding support for AMD processors,
>>>> update the mwait-idle driver to optionally use halt.
>>>>
>>>> Signed-off-by: Brian Woods 
>>>> ---
>>>>xen/arch/x86/cpu/mwait-idle.c | 40 
>>>> +---
>>>>1 file changed, 33 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
>>>> index f89c52f256..a063e39d60 100644
>>>> --- a/xen/arch/x86/cpu/mwait-idle.c
>>>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>>>> @@ -103,6 +103,11 @@ static const struct cpuidle_state {
>>>>
>>>>#define CPUIDLE_FLAG_DISABLED   0x1
>>>>/*
>>>> + * On certain AMD families that support mwait, only c1 can be reached by
>>>> + * mwait and to reach c2, halt has to be used.
>>>> + */
>>>> +#define CPUIDLE_FLAG_USE_HALT 0x2
>>>> +/*
>>>> * Set this flag for states where the HW flushes the TLB for us
>>>> * and so we don't need cross-calls to keep it consistent.
>>>> * If this flag is set, SW flushes the TLB, so even if the
>>>> @@ -783,8 +788,23 @@ static void mwait_idle(void)
>>>>
>>>>update_last_cx_stat(power, cx, before);
>>>>
>>>> -  if (cpu_is_haltable(cpu))
>>>> -  mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
>>>> +  if (cpu_is_haltable(cpu)) {
>>>> +  struct cpu_info *info;
>>>> +  switch (cx->entry_method) {
>>>> +  case ACPI_CSTATE_EM_FFH:
>>>> +  mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
>>>> +  break;
>>>> +  case ACPI_CSTATE_EM_HALT:
>>>
>>>> +  info = get_cpu_info();
>>>> +  spec_ctrl_enter_idle(info);
>>>> +  safe_halt();
>>>> +  spec_ctrl_exit_idle(info);
>>>
>>> May I suggest you make this snippet a function? The same code snippet
>>> appears a few lines above.
>>>
>>> Wei.
>>>
>> It's used in various other places as well (cpu_idle.c, x86/domain.c),
>> would a function like:
>>
>> void safe_halt_with_spec(cpu_info *info)
>> {
>>   if (!info)
>>   info = get_cpu_info();
>>
>>   spec_ctrl_enter_idle(info);
>>   safe_halt();
>>   spec_ctrl_exit_idle(info);
>> }
>>
>> work since that way it could be used in other places where info is
>> already defined?
> 
> Looks reasonable. But I will leave that to Andrew and Jan to decide what
> suits them best.
> 
> Wei.
> 
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel

Ping for Andy and Jan for this and the patches in general?

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-02-27 Thread Woods, Brian

On 2/27/19 7:47 AM, Wei Liu wrote:
> On Mon, Feb 25, 2019 at 08:23:58PM +0000, Woods, Brian wrote:
>> Some AMD processors can use a mixture of mwait and halt for accessing
>> various c-states.  In preparation for adding support for AMD processors,
>> update the mwait-idle driver to optionally use halt.
>>
>> Signed-off-by: Brian Woods 
>> ---
>>   xen/arch/x86/cpu/mwait-idle.c | 40 +---
>>   1 file changed, 33 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
>> index f89c52f256..a063e39d60 100644
>> --- a/xen/arch/x86/cpu/mwait-idle.c
>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>> @@ -103,6 +103,11 @@ static const struct cpuidle_state {
>>   
>>   #define CPUIDLE_FLAG_DISABLED  0x1
>>   /*
>> + * On certain AMD families that support mwait, only c1 can be reached by
>> + * mwait and to reach c2, halt has to be used.
>> + */
>> +#define CPUIDLE_FLAG_USE_HALT   0x2
>> +/*
>>* Set this flag for states where the HW flushes the TLB for us
>>* and so we don't need cross-calls to keep it consistent.
>>* If this flag is set, SW flushes the TLB, so even if the
>> @@ -783,8 +788,23 @@ static void mwait_idle(void)
>>   
>>  update_last_cx_stat(power, cx, before);
>>   
>> -if (cpu_is_haltable(cpu))
>> -mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
>> +if (cpu_is_haltable(cpu)) {
>> +struct cpu_info *info;
>> +switch (cx->entry_method) {
>> +case ACPI_CSTATE_EM_FFH:
>> +mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
>> +break;
>> +case ACPI_CSTATE_EM_HALT:
> 
>> +info = get_cpu_info();
>> +spec_ctrl_enter_idle(info);
>> +safe_halt();
>> +spec_ctrl_exit_idle(info);
> 
> May I suggest you make this snippet a function? The same code snippet
> appears a few lines above.
> 
> Wei.
> 
It's used in various other places as well (cpu_idle.c, x86/domain.c), 
would a function like:

void safe_halt_with_spec(cpu_info *info)
{
 if (!info)
 info = get_cpu_info();

 spec_ctrl_enter_idle(info);
 safe_halt();
 spec_ctrl_exit_idle(info);
}

work since that way it could be used in other places where info is 
already defined?
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] mwait support for AMD processors

2019-02-27 Thread Woods, Brian

On 2/27/19 2:51 AM, Jan Beulich wrote:
 On 26.02.19 at 17:54,  wrote:
>> On 2/26/19 10:37 AM, Jan Beulich wrote:
>> On 26.02.19 at 17:25,  wrote:
 Correct me if I'm wrong, but the Xen's acpi-idle implementation is
 dependent on dom0 using a AML interpreter and then giving that data back
 to Xen.  I've heard that this doesn't always work correctly on PV dom0s
 and doesn't work at all on PVH dom0s.
>>>
>>> For C2 and deeper (using entering methods other than HLT) - yes.
>>> The use of HLT is the default with the assumption that this will put
>>> the system in C1 (i.e. with a pretty low wakeup latency); see
>>> default_idle(), cpuidle_init_cpu(), and acpi_idle_do_entry().
>>
>> Well, assuming C2 is enabled (which I was assume is the default case),
>> HLT roughly puts the processor in C2 rather than C1.  On my test system,
>> the debug console output for the cx tables only output HLT for C1 (which
>> is wrong).
>>
>> Rather than depending on dom0, which is shaky, and not having an AML
>> interpreter, it seems the best solution is to hardcode the values in
>> like Intel does.  If Xen had an AML interpreter, I'd agree doing things
>> differently (reading in the ACPI tables) would be best.  But given the
>> resources Xen has at the moment, this seems like the safest solution and
>> is better than using HLT (which is C2 assuming it's enabled) as the
>> default idle method like Xen is using now.
>>
>> It comes down to sometimes (when C2 is diabled in BIOS) using C1
>> thinking it's C2, or without the patches in the common case using C2
>> thinking it's C1.
> 
> So in one of our idle routines, how would one go about entering
> C1 or C2 depending on wakeup latency requirements? I'm having a
> hard time seeing how HLT can be used for both (without a reboot
> cycle and a BIOS option change in between). Yet if there's only
> one state that can be entered, then it's merely cosmetic whether
> it gets called C1 or C2 in the debug output.
> 
> Anyway - I guess we need to continue this discussion (if necessary)
> once I got around to actually look at the patches.
> 
> Jan
> 
> Does this answer the questions?

C2/CC6 enabled (default our internal MBs [and in general I'd assume])
HLT -> C2/CC6*
mwait -> C1/CC1

C2/CC6 disabled in BIOS
HLT -> C1/CC1
mwait -> C1/CC1

* HLT doesn't directly call C2/CC6 but it has a small timer then flushes 
the caches and puts it in C2/CC6.  Effectively it's the same but not 
exactly.

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] mwait support for AMD processors

2019-02-26 Thread Woods, Brian

On 2/26/19 10:37 AM, Jan Beulich wrote:
 On 26.02.19 at 17:25,  wrote:
>> Correct me if I'm wrong, but the Xen's acpi-idle implementation is
>> dependent on dom0 using a AML interpreter and then giving that data back
>> to Xen.  I've heard that this doesn't always work correctly on PV dom0s
>> and doesn't work at all on PVH dom0s.
> 
> For C2 and deeper (using entering methods other than HLT) - yes.
> The use of HLT is the default with the assumption that this will put
> the system in C1 (i.e. with a pretty low wakeup latency); see
> default_idle(), cpuidle_init_cpu(), and acpi_idle_do_entry().
> 
> Jan
> 

Well, assuming C2 is enabled (which I was assume is the default case), 
HLT roughly puts the processor in C2 rather than C1.  On my test system, 
the debug console output for the cx tables only output HLT for C1 (which 
is wrong).

Rather than depending on dom0, which is shaky, and not having an AML 
interpreter, it seems the best solution is to hardcode the values in 
like Intel does.  If Xen had an AML interpreter, I'd agree doing things 
differently (reading in the ACPI tables) would be best.  But given the 
resources Xen has at the moment, this seems like the safest solution and 
is better than using HLT (which is C2 assuming it's enabled) as the 
default idle method like Xen is using now.

It comes down to sometimes (when C2 is diabled in BIOS) using C1 
thinking it's C2, or without the patches in the common case using C2 
thinking it's C1.

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] mwait support for AMD processors

2019-02-26 Thread Woods, Brian

On 2/26/19 4:49 AM, Jan Beulich wrote:
 On 25.02.19 at 21:23,  wrote:
>> This patch series add support and enablement for mwait on AMD Naples
>> and Rome processors.  Newer AMD processors support mwait, but only for
>> c1, and for c2 halt is used.  The mwait-idle driver is modified to be
>> able to use both mwait and halt for idling.
> 
> I recall you saying so elsewhere, but I continue to be confused. Afaik
> HLT is specified to mean C1. Without having looked at the patches,
> I'm also not happy to see you say you make the driver capable of using
> HLT. That's not its purpose, and I think the ACPI driver should instead
> be used for that.
> 
> It is my understanding that the driver is there solely to overcome
> recurring issues with BIOSes not providing optimal (or even correct)
> ACPI tables. Since for C1 we don't even need any ACPI tables (we
> enter C1 [through HLT] whenever no other C states are defined),
> I'm having trouble seeing what problem would be addressed here.
> Are there really no deeper C states than C2 supported by your CPUs?
> 
> Jan
> 

On Naples/Rome HLT can mean different things depending on how the 
HW/BIOS is set up.  If C2/CC6 is enabled, HLT (after going through some 
checks etc), will put the system in a C2/CC6 state.  If C2/CC6 is 
disabled in BIOS, then HLT will act as C1/CC1. It may not be completely 
ideal but the system will function just fine.  It's the best that can be 
done without reading the tables (that I can think of).

Correct me if I'm wrong, but the Xen's acpi-idle implementation is 
dependent on dom0 using a AML interpreter and then giving that data back 
to Xen.  I've heard that this doesn't always work correctly on PV dom0s 
and doesn't work at all on PVH dom0s.

Brian
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/3] mwait-idle: add support for AMD processors

2019-02-25 Thread Woods, Brian

Newer AMD processors (F17h) have mwait support.  Add some checks to make
sure vendor specific code is run correctly and some infrastructure to
facilitate adding AMD processors.

Signed-off-by: Brian Woods 
---
 xen/arch/x86/cpu/mwait-idle.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index a063e39d60..1036c8b101 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -979,6 +979,13 @@ static const struct x86_cpu_id intel_idle_ids[] 
__initconstrel = {
{}
 };
 
+#define ACPU(family, model, cpu) \
+   { X86_VENDOR_AMD, family, model, X86_FEATURE_ALWAYS, _cpu_##cpu}
+
+static const struct x86_cpu_id amd_idle_ids[] __initconstrel = {
+   {}
+};
+
 /*
  * ivt_idle_state_table_update(void)
  *
@@ -1115,6 +1122,9 @@ static void __init sklh_idle_state_table_update(void)
  */
 static void __init mwait_idle_state_table_update(void)
 {
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+   return;
+
switch (boot_cpu_data.x86_model) {
case 0x3e: /* IVT */
ivt_idle_state_table_update();
@@ -1126,13 +1136,24 @@ static void __init mwait_idle_state_table_update(void)
case 0x5e: /* SKL-H */
sklh_idle_state_table_update();
break;
-   }
+   }
 }
 
 static int __init mwait_idle_probe(void)
 {
unsigned int eax, ebx, ecx;
-   const struct x86_cpu_id *id = x86_match_cpu(intel_idle_ids);
+   const struct x86_cpu_id *id;
+
+   switch (boot_cpu_data.x86_vendor) {
+   case X86_VENDOR_INTEL:
+   id = x86_match_cpu(intel_idle_ids);
+   break;
+   case X86_VENDOR_AMD:
+   id = x86_match_cpu(amd_idle_ids);
+   break;
+   default:
+   id = NULL;
+   }
 
if (!id) {
pr_debug(PREFIX "does not run on family %d model %d\n",
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/3] mwait-idle: add enablement for AMD Naples and Rome

2019-02-25 Thread Woods, Brian

Add the needed data structures for enabling Naples (F17h M01h).  Since
Rome (F17h M31h) has the same c-state latencies and entry methods, the
c-state information can be used for Rome as well.  For both Naples and
Rome, mwait is used for c1 (cc1) and halt is functionally the same as
c2 (cc6).  If c2 (cc6) is disabled in BIOS, then halt functions similar
to c1 (cc1).

Signed-off-by: Brian Woods 
---
 xen/arch/x86/cpu/mwait-idle.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 1036c8b101..d63ec650e0 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -720,6 +720,22 @@ static const struct cpuidle_state dnv_cstates[] = {
{}
 };
 
+static const struct cpuidle_state naples_cstates[] = {
+   {
+   .name = "CC1",
+   .flags = MWAIT2flg(0x00),
+   .exit_latency = 1,
+   .target_residency = 2,
+   },
+   {
+   .name = "CC6",
+   .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_USE_HALT,
+   .exit_latency = 400,
+   .target_residency = 1000,
+   },
+   {}
+};
+
 static void mwait_idle(void)
 {
unsigned int cpu = smp_processor_id();
@@ -979,10 +995,16 @@ static const struct x86_cpu_id intel_idle_ids[] 
__initconstrel = {
{}
 };
 
+static const struct idle_cpu idle_cpu_naples = {
+   .state_table = naples_cstates,
+};
+
 #define ACPU(family, model, cpu) \
{ X86_VENDOR_AMD, family, model, X86_FEATURE_ALWAYS, _cpu_##cpu}
 
 static const struct x86_cpu_id amd_idle_ids[] __initconstrel = {
+   ACPU(0x17, 0x01, naples),
+   ACPU(0x17, 0x31, naples), /* Rome shares the same c-state config */
{}
 };
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt

2019-02-25 Thread Woods, Brian

Some AMD processors can use a mixture of mwait and halt for accessing
various c-states.  In preparation for adding support for AMD processors,
update the mwait-idle driver to optionally use halt.

Signed-off-by: Brian Woods 
---
 xen/arch/x86/cpu/mwait-idle.c | 40 +---
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index f89c52f256..a063e39d60 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -103,6 +103,11 @@ static const struct cpuidle_state {
 
 #define CPUIDLE_FLAG_DISABLED  0x1
 /*
+ * On certain AMD families that support mwait, only c1 can be reached by
+ * mwait and to reach c2, halt has to be used.
+ */
+#define CPUIDLE_FLAG_USE_HALT  0x2
+/*
  * Set this flag for states where the HW flushes the TLB for us
  * and so we don't need cross-calls to keep it consistent.
  * If this flag is set, SW flushes the TLB, so even if the
@@ -783,8 +788,23 @@ static void mwait_idle(void)
 
update_last_cx_stat(power, cx, before);
 
-   if (cpu_is_haltable(cpu))
-   mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
+   if (cpu_is_haltable(cpu)) {
+   struct cpu_info *info;
+   switch (cx->entry_method) {
+   case ACPI_CSTATE_EM_FFH:
+   mwait_idle_with_hints(eax, MWAIT_ECX_INTERRUPT_BREAK);
+   break;
+   case ACPI_CSTATE_EM_HALT:
+   info = get_cpu_info();
+   spec_ctrl_enter_idle(info);
+   safe_halt();
+   spec_ctrl_exit_idle(info);
+   local_irq_disable();
+   break;
+   default:
+   printk(XENLOG_ERR PREFIX "unknown entry method %d\n", 
cx->entry_method);
+   }
+   }
 
after = cpuidle_get_tick();
 
@@ -1184,8 +1204,9 @@ static int mwait_idle_cpu_init(struct notifier_block *nfb,
for (cstate = 0; cpuidle_state_table[cstate].target_residency; 
++cstate) {
unsigned int num_substates, hint, state;
struct acpi_processor_cx *cx;
+   const unsigned int flags = cpuidle_state_table[cstate].flags;
 
-   hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
+   hint = flg2MWAIT(flags);
state = MWAIT_HINT2CSTATE(hint) + 1;
 
if (state > max_cstate) {
@@ -1196,13 +1217,13 @@ static int mwait_idle_cpu_init(struct notifier_block 
*nfb,
/* Number of sub-states for this state in CPUID.MWAIT. */
num_substates = (mwait_substates >> (state * 4))
& MWAIT_SUBSTATE_MASK;
+
/* If NO sub-states for this state in CPUID, skip it. */
-   if (num_substates == 0)
+   if (num_substates == 0 && !(flags & CPUIDLE_FLAG_USE_HALT))
continue;
 
/* if state marked as disabled, skip it */
-   if (cpuidle_state_table[cstate].flags &
-   CPUIDLE_FLAG_DISABLED) {
+   if (flags & CPUIDLE_FLAG_DISABLED) {
printk(XENLOG_DEBUG PREFIX "state %s is disabled",
   cpuidle_state_table[cstate].name);
continue;
@@ -1221,7 +1242,12 @@ static int mwait_idle_cpu_init(struct notifier_block 
*nfb,
cx = dev->states + dev->count;
cx->type = state;
cx->address = hint;
-   cx->entry_method = ACPI_CSTATE_EM_FFH;
+
+   if (flags & CPUIDLE_FLAG_USE_HALT)
+   cx->entry_method = ACPI_CSTATE_EM_HALT;
+else
+   cx->entry_method = ACPI_CSTATE_EM_FFH;
+
cx->latency = cpuidle_state_table[cstate].exit_latency;
cx->target_residency =
cpuidle_state_table[cstate].target_residency;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/3] mwait support for AMD processors

2019-02-25 Thread Woods, Brian

This patch series add support and enablement for mwait on AMD Naples
and Rome processors.  Newer AMD processors support mwait, but only for
c1, and for c2 halt is used.  The mwait-idle driver is modified to be
able to use both mwait and halt for idling.

Brian Woods (3):
  mwait-idle: add support for using halt
  mwait-idle: add support for AMD processors
  mwait-idle: add enablement for AMD Naples and Rome

 xen/arch/x86/cpu/mwait-idle.c | 87 ++-
 1 file changed, 78 insertions(+), 9 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] amd-iommu: use a bitfield for PTE/PDE

2019-02-13 Thread Woods, Brian

On 2/13/19 3:45 AM, Paul Durrant wrote:
>> -Original Message-
>> From: Woods, Brian [mailto:brian.wo...@amd.com]
>> Sent: 12 February 2019 20:14
>> To: Paul Durrant ; xen-devel@lists.xenproject.org
>> Cc: Suthikulpanit, Suravee ; Jan Beulich
>> ; Andrew Cooper ; Wei Liu
>> ; Roger Pau Monne 
>> Subject: Re: [PATCH 1/2] amd-iommu: use a bitfield for PTE/PDE
>>
>> On 2/4/19 5:19 AM, Paul Durrant wrote:
>>> The current use of get/set_field_from/in_reg_u32() is both inefficient
>> and
>>> requires some ugly casting.
>>>
>>> This patch defines a new bitfield structure (amd_iommu_pte) and uses
>> this
>>> structure in all PTE/PDE manipulation, resulting in much more readable
>>> and compact code.
>>>
>>> NOTE: This commit also fixes one malformed comment in
>>> set_iommu_pte_present().
>>>
>>> Signed-off-by: Paul Durrant 
>>
>> Sorry about the delay.
>>
>> Nitpick here, but I'd rather have !!IOMMUF_{writable,readable} than
>> true.
> 
> That's pretty ugly. How about I pass an OR of the flags through to lower 
> level functions rather than a pair of bools? If you're ok with that I'll send 
> a v2.
> 
>Paul
> 

There's no need for a v2 based on that, that's just me nitpicking. 
There's no real nice way to do it without turning 
IOMMUF_{writable,readable} into bools or your suggested way which has 
more code to decode a flag.  Assuming everyone else is ok with the 
patches as are, it's fine.  If there's going to be a v2 for other 
reasons, I'll just leave it up to your discretion (other people may have 
stronger opinions about it anyway).

Brian

>>   Not worth a revision if there isn't anything else though (and is
>> debatable).
>>
>> Acked-by: Brian Woods 
>>
>>> ---
>>> Cc: Suravee Suthikulpanit 
>>> Cc: Brian Woods 
>>> Cc: Jan Beulich 
>>> Cc: Andrew Cooper 
>>> Cc: Wei Liu 
>>> Cc: "Roger Pau Monné" 
>>> ---
>>>xen/drivers/passthrough/amd/iommu_map.c   | 143 --
>>>xen/drivers/passthrough/amd/pci_amd_iommu.c   |  50 +++---
>>>xen/include/asm-x86/hvm/svm/amd-iommu-defs.h  |  47 ++
>>>xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  15 --
>>>4 files changed, 64 insertions(+), 191 deletions(-)
>>>
>>> diff --git a/xen/drivers/passthrough/amd/iommu_map.c
>> b/xen/drivers/passthrough/amd/iommu_map.c
>>> index 67329b0c95..5fda6063df 100644
>>> --- a/xen/drivers/passthrough/amd/iommu_map.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
>>> @@ -38,100 +38,45 @@ static unsigned int pfn_to_pde_idx(unsigned long
>> pfn, unsigned int level)
>>>static unsigned int clear_iommu_pte_present(unsigned long l1_mfn,
>>>unsigned long dfn)
>>>{
>>> -uint64_t *table, *pte;
>>> +struct amd_iommu_pte *table, *pte;
>>>unsigned int flush_flags;
>>>
>>>table = map_domain_page(_mfn(l1_mfn));
>>> +pte = [pfn_to_pde_idx(dfn, 1)];
>>>
>>> -pte = (table + pfn_to_pde_idx(dfn, 1));
>>> +flush_flags = pte->pr ? IOMMU_FLUSHF_modified : 0;
>>> +memset(pte, 0, sizeof(*pte));
>>>
>>> -flush_flags = get_field_from_reg_u32(*pte, IOMMU_PTE_PRESENT_MASK,
>>> - IOMMU_PTE_PRESENT_SHIFT) ?
>>> - IOMMU_FLUSHF_modified : 0;
>>> -
>>> -*pte = 0;
>>>unmap_domain_page(table);
>>>
>>>return flush_flags;
>>>}
>>>
>>> -static unsigned int set_iommu_pde_present(uint32_t *pde,
>>> +static unsigned int set_iommu_pde_present(struct amd_iommu_pte *pte,
>>>  unsigned long next_mfn,
>>>  unsigned int next_level,
>> bool iw,
>>>  bool ir)
>>>{
>>> -uint64_t maddr_next;
>>> -uint32_t addr_lo, addr_hi, entry;
>>> -bool old_present;
>>>unsigned int flush_flags = IOMMU_FLUSHF_added;
>>>
>>> -maddr_next = __pfn_to_paddr(next_mfn);
>>> -
>>> -old_present = get_field_from_reg_u32(pde[0],
>> IOMMU_PTE_PRESENT_MASK,
>>> - IOMMU_PTE_PRESENT_SHIFT);
>>> -if ( old_present )

Re: [Xen-devel] [PATCH 2/2] amd-iommu: use a bitfield for DTE

2019-02-12 Thread Woods, Brian

On 2/4/19 5:19 AM, Paul Durrant wrote:
> The current use of get/set_field_from/in_reg_u32() is both inefficient and
> requires some ugly casting.
> 
> This patch defines a new bitfield structure (amd_iommu_dte) and uses this
> structure in all DTE manipulation, resulting in much more readable and
> compact code.
> 
> NOTE: This patch also includes some clean-up of get_dma_requestor_id() to
>change the types of the arguments from u16 to uint16_t.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Brian Woods 

> ---
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: "Roger Pau Monné" 
> ---
>   xen/drivers/passthrough/amd/iommu_guest.c |  55 ++---
>   xen/drivers/passthrough/amd/iommu_map.c   | 199 +-
>   xen/drivers/passthrough/amd/pci_amd_iommu.c   |  51 ++---
>   xen/include/asm-x86/amd-iommu.h   |   5 -
>   xen/include/asm-x86/hvm/svm/amd-iommu-defs.h  | 120 +--
>   xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  20 +-
>   6 files changed, 139 insertions(+), 311 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_guest.c 
> b/xen/drivers/passthrough/amd/iommu_guest.c
> index 96175bb9ac..328e7509d5 100644
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -76,39 +76,10 @@ static void guest_iommu_disable(struct guest_iommu *iommu)
>   iommu->enabled = 0;
>   }
>   
> -static uint64_t get_guest_cr3_from_dte(dev_entry_t *dte)
> +static uint64_t get_guest_cr3_from_dte(struct amd_iommu_dte *dte)
>   {
> -uint64_t gcr3_1, gcr3_2, gcr3_3;
> -
> -gcr3_1 = get_field_from_reg_u32(dte->data[1],
> -IOMMU_DEV_TABLE_GCR3_1_MASK,
> -IOMMU_DEV_TABLE_GCR3_1_SHIFT);
> -gcr3_2 = get_field_from_reg_u32(dte->data[2],
> -IOMMU_DEV_TABLE_GCR3_2_MASK,
> -IOMMU_DEV_TABLE_GCR3_2_SHIFT);
> -gcr3_3 = get_field_from_reg_u32(dte->data[3],
> -IOMMU_DEV_TABLE_GCR3_3_MASK,
> -IOMMU_DEV_TABLE_GCR3_3_SHIFT);
> -
> -return ((gcr3_3 << 31) | (gcr3_2 << 15 ) | (gcr3_1 << 12)) >> PAGE_SHIFT;
> -}
> -
> -static uint16_t get_domid_from_dte(dev_entry_t *dte)
> -{
> -return get_field_from_reg_u32(dte->data[2], 
> IOMMU_DEV_TABLE_DOMAIN_ID_MASK,
> -  IOMMU_DEV_TABLE_DOMAIN_ID_SHIFT);
> -}
> -
> -static uint16_t get_glx_from_dte(dev_entry_t *dte)
> -{
> -return get_field_from_reg_u32(dte->data[1], IOMMU_DEV_TABLE_GLX_MASK,
> -  IOMMU_DEV_TABLE_GLX_SHIFT);
> -}
> -
> -static uint16_t get_gv_from_dte(dev_entry_t *dte)
> -{
> -return get_field_from_reg_u32(dte->data[1],IOMMU_DEV_TABLE_GV_MASK,
> -  IOMMU_DEV_TABLE_GV_SHIFT);
> +return ((dte->gcr3_trp_51_31 << 31) | (dte->gcr3_trp_30_15 << 15) |
> +(dte->gcr3_trp_14_12 << 12)) >> PAGE_SHIFT;
>   }
>   
>   static unsigned int host_domid(struct domain *d, uint64_t g_domid)
> @@ -397,7 +368,7 @@ static int do_completion_wait(struct domain *d, 
> cmd_entry_t *cmd)
>   static int do_invalidate_dte(struct domain *d, cmd_entry_t *cmd)
>   {
>   uint16_t gbdf, mbdf, req_id, gdom_id, hdom_id;
> -dev_entry_t *gdte, *mdte, *dte_base;
> +struct amd_iommu_dte *gdte, *mdte, *dte_base;
>   struct amd_iommu *iommu = NULL;
>   struct guest_iommu *g_iommu;
>   uint64_t gcr3_gfn, gcr3_mfn;
> @@ -414,23 +385,23 @@ static int do_invalidate_dte(struct domain *d, 
> cmd_entry_t *cmd)
>   return 0;
>   
>   /* Sometimes guest invalidates devices from non-exists dtes */
> -if ( (gbdf * sizeof(dev_entry_t)) > g_iommu->dev_table.size )
> +if ( (gbdf * sizeof(struct amd_iommu_dte)) > g_iommu->dev_table.size )
>   return 0;
>   
>   dte_mfn = guest_iommu_get_table_mfn(d,
>   
> reg_to_u64(g_iommu->dev_table.reg_base),
> -sizeof(dev_entry_t), gbdf);
> +sizeof(struct amd_iommu_dte), gbdf);
>   ASSERT(mfn_valid(_mfn(dte_mfn)));
>   
>   /* Read guest dte information */
>   dte_base = map_domain_page(_mfn(dte_mfn));
>   
> -gdte = dte_base + gbdf % (PAGE_SIZE / sizeof(dev_entry_t));
> +gdte = _base[gbdf % (PAGE_SIZE / sizeof(struct amd_iommu_dte))];
>   
> -gdom_id  = get_domid_from_dte(gdte);
> +gdom_id = gdte->domain_id;
>   gcr3_gfn = get_guest_cr3_from_dte(gdte);
> -glx  = get_glx_from_dte(gdte);
> -gv   = get_gv_from_dte(gdte);
> +glx = gdte->glx;
> +gv = gdte->gv;
>   
>   unmap_domain_page(dte_base);
>   
> @@ -454,11 +425,11 @@ static int do_invalidate_dte(struct domain *d, 
> cmd_entry_t *cmd)
>   /* Setup host device entry */
>   hdom_id = host_domid(d,

Re: [Xen-devel] [PATCH 1/2] amd-iommu: use a bitfield for PTE/PDE

2019-02-12 Thread Woods, Brian

On 2/4/19 5:19 AM, Paul Durrant wrote:
> The current use of get/set_field_from/in_reg_u32() is both inefficient and
> requires some ugly casting.
> 
> This patch defines a new bitfield structure (amd_iommu_pte) and uses this
> structure in all PTE/PDE manipulation, resulting in much more readable
> and compact code.
> 
> NOTE: This commit also fixes one malformed comment in
>set_iommu_pte_present().
> 
> Signed-off-by: Paul Durrant 

Sorry about the delay.

Nitpick here, but I'd rather have !!IOMMUF_{writable,readable} than 
true.  Not worth a revision if there isn't anything else though (and is 
debatable).

Acked-by: Brian Woods 

> ---
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: "Roger Pau Monné" 
> ---
>   xen/drivers/passthrough/amd/iommu_map.c   | 143 --
>   xen/drivers/passthrough/amd/pci_amd_iommu.c   |  50 +++---
>   xen/include/asm-x86/hvm/svm/amd-iommu-defs.h  |  47 ++
>   xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  15 --
>   4 files changed, 64 insertions(+), 191 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
> b/xen/drivers/passthrough/amd/iommu_map.c
> index 67329b0c95..5fda6063df 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -38,100 +38,45 @@ static unsigned int pfn_to_pde_idx(unsigned long pfn, 
> unsigned int level)
>   static unsigned int clear_iommu_pte_present(unsigned long l1_mfn,
>   unsigned long dfn)
>   {
> -uint64_t *table, *pte;
> +struct amd_iommu_pte *table, *pte;
>   unsigned int flush_flags;
>   
>   table = map_domain_page(_mfn(l1_mfn));
> +pte = [pfn_to_pde_idx(dfn, 1)];
>   
> -pte = (table + pfn_to_pde_idx(dfn, 1));
> +flush_flags = pte->pr ? IOMMU_FLUSHF_modified : 0;
> +memset(pte, 0, sizeof(*pte));
>   
> -flush_flags = get_field_from_reg_u32(*pte, IOMMU_PTE_PRESENT_MASK,
> - IOMMU_PTE_PRESENT_SHIFT) ?
> - IOMMU_FLUSHF_modified : 0;
> -
> -*pte = 0;
>   unmap_domain_page(table);
>   
>   return flush_flags;
>   }
>   
> -static unsigned int set_iommu_pde_present(uint32_t *pde,
> +static unsigned int set_iommu_pde_present(struct amd_iommu_pte *pte,
> unsigned long next_mfn,
> unsigned int next_level, bool iw,
> bool ir)
>   {
> -uint64_t maddr_next;
> -uint32_t addr_lo, addr_hi, entry;
> -bool old_present;
>   unsigned int flush_flags = IOMMU_FLUSHF_added;
>   
> -maddr_next = __pfn_to_paddr(next_mfn);
> -
> -old_present = get_field_from_reg_u32(pde[0], IOMMU_PTE_PRESENT_MASK,
> - IOMMU_PTE_PRESENT_SHIFT);
> -if ( old_present )
> -{
> -bool old_r, old_w;
> -unsigned int old_level;
> -uint64_t maddr_old;
> -
> -addr_hi = get_field_from_reg_u32(pde[1],
> - IOMMU_PTE_ADDR_HIGH_MASK,
> - IOMMU_PTE_ADDR_HIGH_SHIFT);
> -addr_lo = get_field_from_reg_u32(pde[0],
> - IOMMU_PTE_ADDR_LOW_MASK,
> - IOMMU_PTE_ADDR_LOW_SHIFT);
> -old_level = get_field_from_reg_u32(pde[0],
> -   IOMMU_PDE_NEXT_LEVEL_MASK,
> -   IOMMU_PDE_NEXT_LEVEL_SHIFT);
> -old_w = get_field_from_reg_u32(pde[1],
> -   IOMMU_PTE_IO_WRITE_PERMISSION_MASK,
> -   IOMMU_PTE_IO_WRITE_PERMISSION_SHIFT);
> -old_r = get_field_from_reg_u32(pde[1],
> -   IOMMU_PTE_IO_READ_PERMISSION_MASK,
> -   IOMMU_PTE_IO_READ_PERMISSION_SHIFT);
> -
> -maddr_old = ((uint64_t)addr_hi << 32) |
> -((uint64_t)addr_lo << PAGE_SHIFT);
> -
> -if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
> - old_level != next_level )
> +if ( pte->pr &&
> + (pte->mfn != next_mfn ||
> +  pte->iw != iw ||
> +  pte->ir != ir ||
> +  pte->next_level != next_level) )
>   flush_flags |= IOMMU_FLUSHF_modified;
> -}
>   
> -addr_lo = maddr_next & DMA_32BIT_MASK;
> -addr_hi = maddr_next >> 32;
> -
> -/* enable read/write permissions,which will be enforced at the PTE */
> -set_field_in_reg_u32(addr_hi, 0,
> - IOMMU_PDE_ADDR_HIGH_MASK,
> - IOMMU_PDE_ADDR_HIGH_SHIFT, );
> -set_field_in_reg_u32(iw, entry,
> - IOMMU_PDE_IO_WRITE_PERMISSION_MASK,
> -

Re: [Xen-devel] [PATCH for-4.12 v4 2/3] x86/svm: Drop enum instruction_index and simplify svm_get_insn_len()

2019-01-31 Thread Woods, Brian

On 1/31/19 12:24 PM, Andrew Cooper wrote:
> Passing a 32-bit integer index into an array with entries containing less than
> 32 bits of data is wasteful, and creates an unnecessary error condition of
> passing an out-of-range index.
> 
> The width of the X86EMUL_OPC() encoding is currently 20 bits for the
> instructions used, which leaves room for a modrm byte.  Drop opc_tab[]
> entirely, and encode the expected opcode/modrm information directly.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> CC: Juergen Gross 
> 
> The internals of X86EMUL_OPC() mean that we can't actually check for overflows
> with BUILD_BUG_ON(), but if the opcode encoding does changes and overflow,
> then the resulting fallout will be very obvious in debug builds of Xen.
> 
> v3:
>   * New
> v4:
>   * Drop MODRM(), use Octal instead.
> ---
>   xen/arch/x86/hvm/svm/emulate.c| 51 +++--
>   xen/include/asm-x86/hvm/svm/emulate.h | 53 
> +++
>   2 files changed, 39 insertions(+), 65 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
> index 7799908..fb0d823 100644
> --- a/xen/arch/x86/hvm/svm/emulate.c
> +++ b/xen/arch/x86/hvm/svm/emulate.c
> @@ -54,36 +54,6 @@ static unsigned long svm_nextrip_insn_length(struct vcpu 
> *v)
>   return vmcb->nextrip - vmcb->rip;
>   }
>   
> -static const struct {
> -unsigned int opcode;
> -struct {
> -unsigned int rm:3;
> -unsigned int reg:3;
> -unsigned int mod:2;
> -#define MODRM(mod, reg, rm) { rm, reg, mod }
> -} modrm;
> -} opc_tab[INSTR_MAX_COUNT] = {
> -[INSTR_PAUSE]   = { X86EMUL_OPC_F3(0, 0x90) },
> -[INSTR_INT3]= { X86EMUL_OPC(   0, 0xcc) },
> -[INSTR_ICEBP]   = { X86EMUL_OPC(   0, 0xf1) },
> -[INSTR_HLT] = { X86EMUL_OPC(   0, 0xf4) },
> -[INSTR_XSETBV]  = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 2, 1) },
> -[INSTR_VMRUN]   = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 0) },
> -[INSTR_VMCALL]  = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 1) },
> -[INSTR_VMLOAD]  = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 2) },
> -[INSTR_VMSAVE]  = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 3) },
> -[INSTR_STGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 4) },
> -[INSTR_CLGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 5) },
> -[INSTR_INVLPGA] = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 7) },
> -[INSTR_RDTSCP]  = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 7, 1) },
> -[INSTR_INVD]= { X86EMUL_OPC(0x0f, 0x08) },
> -[INSTR_WBINVD]  = { X86EMUL_OPC(0x0f, 0x09) },
> -[INSTR_WRMSR]   = { X86EMUL_OPC(0x0f, 0x30) },
> -[INSTR_RDTSC]   = { X86EMUL_OPC(0x0f, 0x31) },
> -[INSTR_RDMSR]   = { X86EMUL_OPC(0x0f, 0x32) },
> -[INSTR_CPUID]   = { X86EMUL_OPC(0x0f, 0xa2) },
> -};
> -
>   /*
>* First-gen SVM didn't have the NextRIP feature, meaning that when we take 
> a
>* fault-style vmexit, we have to decode the instruction stream to calculate
> @@ -93,12 +63,13 @@ static const struct {
>* hardware reported instruction length (if available) with the result from
>* x86_decode_insn().
>*/
> -unsigned int svm_get_insn_len(struct vcpu *v, enum instruction_index insn)
> +unsigned int svm_get_insn_len(struct vcpu *v, unsigned int instr_enc)
>   {
>   struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
>   struct hvm_emulate_ctxt ctxt;
>   struct x86_emulate_state *state;
>   unsigned long nrip_len, emul_len;
> +unsigned int instr_opcode, instr_modrm;
>   unsigned int modrm_rm, modrm_reg;
>   int modrm_mod;
>   
> @@ -131,20 +102,18 @@ unsigned int svm_get_insn_len(struct vcpu *v, enum 
> instruction_index insn)
>   }
>   #endif
>   
> -if ( insn >= ARRAY_SIZE(opc_tab) )
> -{
> -ASSERT_UNREACHABLE();
> -return 0;
> -}
> +/* Extract components from instr_enc. */
> +instr_modrm  = instr_enc & 0xff;
> +instr_opcode = instr_enc >> 8;
>   
> -if ( opc_tab[insn].opcode == ctxt.ctxt.opcode )
> +if ( instr_opcode == ctxt.ctxt.opcode )
>   {
> -if ( !opc_tab[insn].modrm.mod )
> +if ( !instr_modrm )
>   return emul_len;
>   
> -if ( modrm_mod == opc_tab[insn].modrm.mod &&
> - (modrm_rm & 7) == opc_tab[insn].modrm.rm &&
> - (modrm_reg & 7) == opc_tab[insn].modrm.reg )
> +if ( modrm_mod   == MASK_EXTR(instr_modrm, 0300) &&
> + (modrm_reg & 7) == MASK_EXTR(instr_modrm, 0070) &&
> + (modrm_rm  & 7) == MASK_EXTR(instr_modrm, 0007) )
>   return emul_len;
>   }
>   
> diff --git a/xen/include/asm-x86/hvm/svm/emulate.h 
> b/xen/include/asm-x86/hvm/svm/emulate.h
> index 82359ec..9af1006 100644
> --- a/xen/include/asm-x86/hvm/svm/emulate.h
> +++ b/xen/include/asm-x86/hvm/svm/emulate.h
> @@ -19,33

Re: [Xen-devel] [PATCH v3 1/3] x86/svm: Remove list functionality from __get_instruction_length_* infrastructure

2019-01-31 Thread Woods, Brian

On 12/31/18 5:37 AM, Andrew Cooper wrote:
> The existing __get_instruction_length_from_list() has a single user
> which uses the list functionality.  That user however should be looking
> specifically for INVD or WBINVD, as reported by the vmexit exit reason.
> 
> Modify svm_vmexit_do_invalidate_cache() to ask for the correct
> instruction, and drop all list functionality from the helper.
> 
> Take the opportunity to rename it to svm_get_insn_len(), and drop the
> IOIO length handling whch has never been used.
> 
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Acked-by: Brian Woods 

> ---
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> 
> v2:
>   * New
> v3:
>   * Deduplicate the calls to svm_nextrip_insn_length()
> ---
>   xen/arch/x86/hvm/svm/emulate.c| 76 
> +--
>   xen/arch/x86/hvm/svm/nestedsvm.c  |  9 +++--
>   xen/arch/x86/hvm/svm/svm.c| 39 +-
>   xen/include/asm-x86/hvm/svm/emulate.h |  9 +
>   4 files changed, 61 insertions(+), 72 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
> index 4abeab8..7799908 100644
> --- a/xen/arch/x86/hvm/svm/emulate.c
> +++ b/xen/arch/x86/hvm/svm/emulate.c
> @@ -84,28 +84,31 @@ static const struct {
>   [INSTR_CPUID]   = { X86EMUL_OPC(0x0f, 0xa2) },
>   };
>   
> -int __get_instruction_length_from_list(struct vcpu *v,
> -const enum instruction_index *list, unsigned int list_count)
> +/*
> + * First-gen SVM didn't have the NextRIP feature, meaning that when we take a
> + * fault-style vmexit, we have to decode the instruction stream to calculate
> + * how many bytes to move %rip forwards by.
> + *
> + * To double check the implementation, in debug builds, always compare the
> + * hardware reported instruction length (if available) with the result from
> + * x86_decode_insn().
> + */
> +unsigned int svm_get_insn_len(struct vcpu *v, enum instruction_index insn)
>   {
>   struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
>   struct hvm_emulate_ctxt ctxt;
>   struct x86_emulate_state *state;
> -unsigned long inst_len, j;
> +unsigned long nrip_len, emul_len;
>   unsigned int modrm_rm, modrm_reg;
>   int modrm_mod;
>   
> -/*
> - * In debug builds, always use x86_decode_insn() and compare with
> - * hardware.
> - */
> -#ifdef NDEBUG
> -if ( (inst_len = svm_nextrip_insn_length(v)) > MAX_INST_LEN )
> -gprintk(XENLOG_WARNING, "NRip reported inst_len %lu\n", inst_len);
> -else if ( inst_len != 0 )
> -return inst_len;
> +nrip_len = svm_nextrip_insn_length(v);
>   
> -if ( vmcb->exitcode == VMEXIT_IOIO )
> -return vmcb->exitinfo2 - vmcb->rip;
> +#ifdef NDEBUG
> +if ( nrip_len > MAX_INST_LEN )
> +gprintk(XENLOG_WARNING, "NRip reported inst_len %lu\n", nrip_len);
> +else if ( nrip_len != 0 )
> +return nrip_len;
>   #endif
>   
>   ASSERT(v == current);
> @@ -115,41 +118,34 @@ int __get_instruction_length_from_list(struct vcpu *v,
>   if ( IS_ERR_OR_NULL(state) )
>   return 0;
>   
> -inst_len = x86_insn_length(state, );
> +emul_len = x86_insn_length(state, );
>   modrm_mod = x86_insn_modrm(state, _rm, _reg);
>   x86_emulate_free_state(state);
> +
>   #ifndef NDEBUG
> -if ( vmcb->exitcode == VMEXIT_IOIO )
> -j = vmcb->exitinfo2 - vmcb->rip;
> -else
> -j = svm_nextrip_insn_length(v);
> -if ( j && j != inst_len )
> +if ( nrip_len && nrip_len != emul_len )
>   {
>   gprintk(XENLOG_WARNING, "insn-len[%02x]=%lu (exp %lu)\n",
> -ctxt.ctxt.opcode, inst_len, j);
> -return j;
> +ctxt.ctxt.opcode, nrip_len, emul_len);
> +return nrip_len;
>   }
>   #endif
>   
> -for ( j = 0; j < list_count; j++ )
> +if ( insn >= ARRAY_SIZE(opc_tab) )
>   {
> -unsigned int instr = list[j];
> -
> -if ( instr >= ARRAY_SIZE(opc_tab) )
> -{
> -ASSERT_UNREACHABLE();
> -break;
> -}
> -if ( opc_tab[instr].opcode == ctxt.ctxt.opcode )
> -{
> -if ( !opc_tab[instr].modrm.mod )
> -return inst_len;
> -
> -if ( modrm_mod == opc_tab[instr].modrm.mod &&
> - (modrm_rm & 7) == opc_tab[instr].modrm.rm &&
> - (modrm_reg & 7) == opc_tab[instr].modrm.reg )
> -return inst_len;
> -}
> +ASSERT_UNREACHABLE();
> +return 0;
> +}
> +
> +if ( opc_tab[insn].opcode == ctxt.ctxt.opcode )
> +{
> +if ( !opc_tab[insn].modrm.mod )
> +return emul_len;
> +
> +if ( modrm_mod == opc_tab[insn].modrm.mod &&
> + (modrm_rm & 7) == opc_tab[insn].modrm.rm &&
> + (modrm_reg & 7) == opc_tab[insn].modrm.reg )
> +return emul_len;
>   }
>   
>

Re: [Xen-devel] [PATCH 03/14] AMD/IOMMU: Fix multiple reference counting errors

2019-01-31 Thread Woods, Brian

On 11/21/18 7:21 AM, Andrew Cooper wrote:
> Most of these issues would be XSAs if these paths were accessible to guests.
> 
> First, override the {get,put}_gfn() helpers to use gfn_t, which was the
> original purpose of this patch.
> 
> guest_iommu_get_table_mfn() has two bugs.  First, it gets a ref on one gfn,
> and puts a ref for a different gfn.  This is only a latent bug for now, as we
> don't do per-gfn locking yet.  Next, the mfn return value is unsafe to use
> after put_gfn() is called, as the guest could have freed the page in the
> meantime.
> 
> In addition, get_gfn_from_base_reg() erroneously asserts that base_raw can't
> be 0, but it may legitimately be.  On top of that, the return value from
> guest_iommu_get_table_mfn() is passed into map_domain_page() before checking
> that it is a real mfn.
> 
> Most of the complexity here is inlining guest_iommu_get_table_mfn() and
> holding the gfn reference until the operation is complete.
> 
> Furthermore, guest_iommu_process_command() is altered to take a local copy of
> cmd_entry_t, rather than passing a pointer to guest controlled memory into
> each of the handling functions.  It is also modified to break on error rather
> than continue.  These changes are in line with the spec which states that the
> IOMMU will strictly read a command entry once, and will cease processing if an
> error is encountered.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Suravee Suthikulpanit 
> CC: Brian Woods 
> 
> This patch my no means indicates that the code is ready for production use.
> ---
>   xen/drivers/passthrough/amd/iommu_guest.c | 224 
> +++---
>   1 file changed, 146 insertions(+), 78 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_guest.c 
> b/xen/drivers/passthrough/amd/iommu_guest.c
> index 96175bb..03ca0cf 100644
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -21,6 +21,13 @@
>   #include 
>   #include 
>   
> +/* Override {get,put}_gfn to work with gfn_t */
> +#undef get_gfn
> +#define get_gfn(d, g, t) get_gfn_type(d, gfn_x(g), t, P2M_ALLOC)
> +#undef get_gfn_query
> +#define get_gfn_query(d, g, t) get_gfn_type(d, gfn_x(g), t, 0)
> +#undef put_gfn
> +#define put_gfn(d, g) __put_gfn(p2m_get_hostp2m(d), gfn_x(g))
>   
>   #define IOMMU_MMIO_SIZE 0x8000
>   #define IOMMU_MMIO_PAGE_NR  0x8
> @@ -117,13 +124,6 @@ static unsigned int host_domid(struct domain *d, 
> uint64_t g_domid)
>   return d->domain_id;
>   }
>   
> -static unsigned long get_gfn_from_base_reg(uint64_t base_raw)
> -{
> -base_raw &= PADDR_MASK;
> -ASSERT ( base_raw != 0 );
> -return base_raw >> PAGE_SHIFT;
> -}
> -
>   static void guest_iommu_deliver_msi(struct domain *d)
>   {
>   uint8_t vector, dest, dest_mode, delivery_mode, trig_mode;
> @@ -138,23 +138,6 @@ static void guest_iommu_deliver_msi(struct domain *d)
>   vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
>   }
>   
> -static unsigned long guest_iommu_get_table_mfn(struct domain *d,
> -   uint64_t base_raw,
> -   unsigned int entry_size,
> -   unsigned int pos)
> -{
> -unsigned long idx, gfn, mfn;
> -p2m_type_t p2mt;
> -
> -gfn = get_gfn_from_base_reg(base_raw);
> -idx = (pos * entry_size) >> PAGE_SHIFT;
> -
> -mfn = mfn_x(get_gfn(d, gfn + idx, ));
> -put_gfn(d, gfn);
> -
> -return mfn;
> -}
> -
>   static void guest_iommu_enable_dev_table(struct guest_iommu *iommu)
>   {
>   uint32_t length_raw = 
> get_field_from_reg_u32(iommu->dev_table.reg_base.lo,
> @@ -176,7 +159,10 @@ static void guest_iommu_enable_ring_buffer(struct 
> guest_iommu *iommu,
>   void guest_iommu_add_ppr_log(struct domain *d, u32 entry[])
>   {
>   uint16_t gdev_id;
> -unsigned long mfn, tail, head;
> +unsigned long tail, head;
> +mfn_t mfn;
> +gfn_t gfn;
> +p2m_type_t p2mt;
>   ppr_entry_t *log, *log_base;
>   struct guest_iommu *iommu;
>   
> @@ -197,11 +183,24 @@ void guest_iommu_add_ppr_log(struct domain *d, u32 
> entry[])
>   return;
>   }
>   
> -mfn = guest_iommu_get_table_mfn(d, reg_to_u64(iommu->ppr_log.reg_base),
> -sizeof(ppr_entry_t), tail);
> -ASSERT(mfn_valid(_mfn(mfn)));
> +gfn = _gfn(PFN_DOWN(reg_to_u64(iommu->ppr_log.reg_base)) +
> +   PFN_DOWN(tail * sizeof(*log)));
>   
> -log_base = map_domain_page(_mfn(mfn));
> +mfn = get_gfn(d, gfn, );
> +if ( mfn_eq(mfn, INVALID_MFN) || !p2m_is_ram(p2mt) )
> +{
> +AMD_IOMMU_DEBUG(
> +"Error: guest iommu ppr log bad gfn %"PRI_gfn", type %u, mfn %"
> +PRI_mfn", reg_base %#"PRIx64", tail %#lx\n",
> +gfn_x(gfn), p2mt,

Re: [Xen-devel] [PATCH v3] x86/AMD: flush TLB after ucode update

2019-01-28 Thread Woods, Brian

On 1/28/19 8:25 AM, Andrew Cooper wrote:
> On 28/01/2019 14:19, Jan Beulich wrote:
 --- a/xen/arch/x86/microcode_amd.c
 +++ b/xen/arch/x86/microcode_amd.c
 @@ -218,6 +218,12 @@ static int apply_microcode(unsigned int
   
   spin_unlock_irqrestore(_update_lock, flags);
   
 +/*
 + * Experimentally this helps with performance issues on at least 
 certain
 + * Fam15 models.
>>> This is no longer experimental, now that we understand why.  How about:
>>>
>>> "Some processors leave the ucode blob mapping as UC after the update.
>>> Flush the mapping to regain normal cacheability" ?
>>>
>>> That way, its also slightly less cryptic in the code.
>> I did consider re-wording the comment, but decided to leave it unchanged,
>> for the way you word it not having public proof anywhere (for now at least).
>> I'm fine to change the comment, if I can explicit go-ahead from AMD. Brian,
>> Suravee?
> 
> Preferably with the amended wording, (but ultimately, as agreed upon
> with AMD), Reviewed-by: Andrew Cooper 
> 
Likewise,
Reviewed-by: Brian Woods 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] amd/iommu: fix present bit checking when clearing PTE

2019-01-24 Thread Woods, Brian

On 1/23/19 3:47 AM, Roger Pau Monne wrote:
> The current check for the present bit is wrong, since the present bit
> is located in the low part of the entry.
> 
> Fixes: e8afe1124cc1 ("iommu: elide flushing for higher order map/unmap 
> operations")
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Brian Woods 

> ---
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Juergen Gross 
> Cc: Paul Durrant 
> ---
>   xen/drivers/passthrough/amd/iommu_map.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
> b/xen/drivers/passthrough/amd/iommu_map.c
> index 99ac0a6862..67329b0c95 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -39,15 +39,13 @@ static unsigned int clear_iommu_pte_present(unsigned long 
> l1_mfn,
>   unsigned long dfn)
>   {
>   uint64_t *table, *pte;
> -uint32_t entry;
>   unsigned int flush_flags;
>   
>   table = map_domain_page(_mfn(l1_mfn));
>   
>   pte = (table + pfn_to_pde_idx(dfn, 1));
> -entry = *pte >> 32;
>   
> -flush_flags = get_field_from_reg_u32(entry, IOMMU_PTE_PRESENT_MASK,
> +flush_flags = get_field_from_reg_u32(*pte, IOMMU_PTE_PRESENT_MASK,
>IOMMU_PTE_PRESENT_SHIFT) ?
>IOMMU_FLUSHF_modified : 0;
>   
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/4] amd-iommu: add flush iommu_ops

2018-12-20 Thread Woods, Brian

From: Paul Durrant 
Sent: Monday, December 17, 2018 3:22 AM
To: xen-devel@lists.xenproject.org
Cc: Paul Durrant; Suthikulpanit, Suravee; Woods, Brian; Andrew Cooper; Wei Liu; 
Roger Pau Monné
Subject: [PATCH v5 1/4] amd-iommu: add flush iommu_ops

The iommu_ops structure contains two methods for flushing: 'iotlb_flush' and
'iotlb_flush_all'. This patch adds implementations of these for AMD IOMMUs.

The iotlb_flush method takes a base DFN and a (4k) page count, but the
flush needs to be done by page order (i.e. 0, 9 or 18). Because a flush
operation is fairly expensive to perform, the code calculates the minimum
order single flush that will cover the specified page range rather than
performing multiple flushes.

Signed-off-by: Paul Durrant 
Reviewed-by: Jan Beulich 

Acked-by: Brian Woods 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/4] iommu: elide flushing for higher order map/unmap operations

2018-12-20 Thread Woods, Brian

From: Paul Durrant 
Sent: Monday, December 17, 2018 3:22 AM
To: xen-devel@lists.xenproject.org
Cc: Paul Durrant; Stefano Stabellini; Julien Grall; Andrew Cooper; George 
Dunlap; Ian Jackson; Konrad Rzeszutek Wilk; Tim Deegan; Wei Liu; Suthikulpanit, 
Suravee; Woods, Brian; Roger Pau Monné
Subject: [PATCH v5 3/4] iommu: elide flushing for higher order map/unmap 
operations

This patch removes any implicit flushing that occurs in the implementation
of map and unmap operations and adds new iommu_map/unmap() wrapper
functions. To maintain semantics of the iommu_legacy_map/unmap() wrapper
functions, these are modified to call the new wrapper functions and then
perform an explicit flush operation.

Because VT-d currently performs two different types of flush dependent upon
whether a PTE is being modified versus merely added (i.e. replacing a non-
present PTE) 'iommu flush flags' are defined by this patch and the
iommu_ops map_page() and unmap_page() methods are modified to OR the type
of flush necessary for the PTE that has been populated or depopulated into
an accumulated flags value. The accumulated value can then be passed into
the explicit flush operation.

The ARM SMMU implementations of map_page() and unmap_page() currently
perform no implicit flushing and therefore the modified methods do not
adjust the flush flags.

NOTE: The per-cpu 'iommu_dont_flush_iotlb' is respected by the
  iommu_legacy_map/unmap() wrapper functions and therefore this now
  applies to all IOMMU implementations rather than just VT-d.

Signed-off-by: Paul Durrant 
Reviewed-by: Jan Beulich 
Reviewed-by: Kevin Tian 

Acked-by: Brian Woods 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 8/9] x86/amd: Virtualise MSR_VIRT_SPEC_CTRL for guests

2018-12-06 Thread Woods, Brian

On Wed, Dec 05, 2018 at 01:41:30AM -0700, Jan Beulich wrote:
> >>> On 04.12.18 at 22:35,  wrote:
> > The other thing I don't get is why advertise virtualized SSBD when the
> > guest setting it does nothing?  If ssbd_opt=true is set, as the code is
> > now, why even advertise it to the guest?  I'd suggest either allowing
> > the guest to turn it off or not advertise it at all (when ssbd_opt =
> > true).
> 
> I think it's better to advertise the feature nevertheless: Otherwise
> the guest might either try some other way of mitigating the
> (believed) vulnerability, or it may report in its logs that it's vulnerable
> (without mitigation) when it really isn't.
> 
> Jan
> 

I can understand that reasoning, but I'd still argue that an additional
option to force guests to use SSBD (like setting ssbd=yes in these
patches) and the default of ssbd=yes allow the guest to turn it off
would be more correct.  I'm not going to be adamant about it though.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 6/9] x86/amd: Allocate resources to cope with LS_CFG being per-core on Fam17h

2018-12-06 Thread Woods, Brian

On Thu, Dec 06, 2018 at 06:46:51PM +, Andy Cooper wrote:
> On 06/12/2018 08:54, Jan Beulich wrote:
>  On 05.12.18 at 18:05,  wrote:
> >> On 05/12/2018 16:57, Jan Beulich wrote:
> >> On 03.12.18 at 17:18,  wrote:
>  --- a/xen/arch/x86/cpu/amd.c
>  +++ b/xen/arch/x86/cpu/amd.c
>  @@ -419,6 +419,97 @@ static void __init noinline 
>  amd_probe_legacy_ssbd(void)
>   }
>   
>   /*
>  + * This is all a gross hack, but Xen really doesn't have flexible-enough
>  + * per-cpu infrastructure to do it properly.  For Zen(v1) with SMT 
>  active,
>  + * MSR_AMD64_LS_CFG is per-core rather than per-thread, so we need a 
>  per-core
>  + * spinlock to synchronise updates of the MSR.
>  + *
>  + * We can't use per-cpu state because taking one CPU offline would free 
>  state
>  + * under the feet of another.  Ideally, we'd allocate memory on the AP 
>  boot
>  + * path, but by the time the sibling information is calculated 
>  sufficiently
>  + * for us to locate the per-core state, it's too late to fail the AP 
>  boot.
>  + *
>  + * We also can't afford to end up in a heterogeneous scenario with some 
>  CPUs
>  + * unable to safely use LS_CFG.
>  + *
>  + * Therefore, we have to allocate for the worse-case scenario, which is
>  + * believed to be 4 sockets.  Any allocation failure cause us to turn 
>  LS_CFG
>  + * off, as this is fractionally better than failing to boot.
>  + */
>  +static struct ssbd_ls_cfg {
>  +spinlock_t lock;
>  +unsigned int disable_count;
>  +} *ssbd_ls_cfg[4];
> >>> Same question as to Brian for his original code: Instead of the
> >>> hard-coding of 4, can't you use nr_sockets here?
> >>> smp_prepare_cpus() runs before pre-SMP initcalls after all.
> >> nr_sockets has zero connection with reality as far as I can tell.
> >>
> >> On this particular box it reports 6 when the correct answer is 2.  I've
> >> got some Intel boxes where nr_sockets reports 15 and the correct answer
> >> is 4.
> > If you look back at when it was introduced, the main goal was
> > for it to never be too low. Any improvements to its calculation
> > are welcome, provided they maintain that guarantee. To high
> > a socket count is imo still better than a hard-coded one.
> 
> Even for the extra 2k of memory it will waste?
> 
> ~Andrew

Just as a side note, for processors using MSR LS_CFG and have SMT
enabled (F17h), there should only be 2 physical sockets.  The 4 was a
worst case (and before some other information was available).
Realistically, there should only be a max of 2 physical sockets when
this needed.  Although, having 4 could be nice as a safe buffer and
only costs 16 bytes.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] AMD IOMMU: fix debug console IOMMU intremap output

2018-12-05 Thread Woods, Brian

On Wed, Dec 05, 2018 at 02:00:43AM -0700, Jan Beulich wrote:
> >>> On 04.12.18 at 22:47,  wrote:
> > --- a/xen/drivers/passthrough/amd/iommu_intr.c
> > +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> > @@ -665,6 +665,24 @@ int __init amd_setup_hpet_msi(struct msi_desc 
> > *msi_desc)
> >  return rc;
> >  }
> >  
> > +
> > +static bool intremap_table_empty(const u32 *table)
> 
> uint32_t here please and ...
> 
> > +{
> > +u32 count;
> 
> ... since a fixed width type isn't needed here in the first place,
> unsigned int here. (This is notwithstanding the fact that I
> assume you've merely cloned dump_intremap_table().)

Gah, I did copy/clone dump_intremap_table, if I keep the same code
strucutre I'll use you ahd Paul's suggestions.

> > +if ( !table )
> > +return true;
> > +
> > +for ( count = 0; count < INTREMAP_ENTRIES; count++ )
> > +{
> > +if ( table[count] )
> > +return false;
> > +}
> > +return true;
> 
> Blank line above here please.
> 
> > +}
> > +
> > +
> > +
> >  static void dump_intremap_table(const u32 *table)
> 
> No multiple consecutive blank lines in general please (there may
> be extremely limited cases where exceptions are possible).
> 
> > @@ -687,13 +705,17 @@ static int dump_intremap_mapping(u16 seg, struct 
> > ivrs_mappings *ivrs_mapping)
> >  if ( !ivrs_mapping )
> >  return 0;
> >  
> > -printk("  %04x:%02x:%02x:%u:\n", seg,
> > -   PCI_BUS(ivrs_mapping->dte_requestor_id),
> > -   PCI_SLOT(ivrs_mapping->dte_requestor_id),
> > -   PCI_FUNC(ivrs_mapping->dte_requestor_id));
> > -
> >  spin_lock_irqsave(&(ivrs_mapping->intremap_lock), flags);
> > -dump_intremap_table(ivrs_mapping->intremap_table);
> > +
> > +if ( !intremap_table_empty(ivrs_mapping->intremap_table) ) {
> 
> Brace on its own line please.
> 
> > +printk("  %04x:%02x:%02x:%u:\n", seg,
> > +   PCI_BUS(ivrs_mapping->dte_requestor_id),
> > +   PCI_SLOT(ivrs_mapping->dte_requestor_id),
> > +   PCI_FUNC(ivrs_mapping->dte_requestor_id));
> > +
> > +dump_intremap_table(ivrs_mapping->intremap_table);
> > +}
> 
> dump_intremap_table() already skips empty entries, so aiui it
> is just the headline above you omit. How much of a savings is
> this really?
> 
> Furthermore, instead of adding a second function with a second
> loop, did you consider moving the logging of the headline into
> dump_intremap_table(), issuing the line the first time you hit a
> non-empty entry?
> 
> Jan
> 

I did think about doing that also, but ended up going with this route.
What I'll do is move printing the headline intp dump_intremap_table and
see what you guys think (since it's such a small patch, it'll be easier
to do that than talk about it).


-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] amd-iommu: remove page merging code

2018-12-04 Thread Woods, Brian

On Wed, Nov 28, 2018 at 09:55:59AM +, Paul Durrant wrote:
> The page merging logic makes use of bits 1-8 and bit 63 of a PTE, which
> used to be specified as 'ignored'. However, bits 5 and 6 are now specified
> as 'accessed' and 'dirty' bits and their use only remains safe as long as
> the DTE 'Host Access Dirty' bits remain unused by Xen, or by hardware
> before the domain starts running. (XSA-275 disabled the operation of the
> code after domain creation completes).
> 
> With the page merging logic present in its current form there are no spare
> ignored bits in the PTE at all, but PV-IOMMU support will require at least
> one spare bit to track which PTEs are added by hypercall.
> 
> This patch removes the code, freeing up the remaining PTE ignored bits
> for other use, including PV-IOMMU support, as well as significantly
> simplifying and shortening the source by ~170 lines. There may be some
> marginal performance cost (but none has been observed in manual testing
> with a passed-through NVIDIA GPU) since higher order mappings will now be
> ruled out until a mapping order parameter is passed to iommu_ops. That will
> be dealt with by a subsequent patch though.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/6] microcode: save all microcodes which pass sanity check

2018-12-04 Thread Woods, Brian

On Thu, Nov 29, 2018 at 10:22:10AM +0100, Roger Pau Monné wrote:
> On Thu, Nov 29, 2018 at 10:40:32AM +0800, Chao Gao wrote:
> > On Wed, Nov 28, 2018 at 01:00:14PM +0100, Roger Pau Monné wrote:
> > >On Wed, Nov 28, 2018 at 01:34:12PM +0800, Chao Gao wrote:
> > >> ... and search caches to find a suitable one when loading.
> > >
> > >Why do you need to save all of them? You are only going to load a
> > >single microcode, so I don't understand the need to cache them all.
> 
> I think the above question needs an answer.
> 
> > >IMO making such modifications to the AMD code without testing it is
> > >very dangerous. Could you get an AMD system or ask an AMD dev to test
> > >it? I would try with the AMD SVM maintainers.
> > 
> > It is improbable for me to find an AMD machine in my team. I will copy AMD
> > SVM maintainers in the coming versions and ask them to help to test this
> > series.
> 
> I'm Cc'ing them now in case they want to provide some feedback.
> 
> > >> +static int save_patch(struct ucode_patch *new_patch)
> > >> +{
> > >> +struct ucode_patch *ucode_patch;
> > >> +struct microcode_amd *new_mc = new_patch->data;
> > >> +struct microcode_header_amd *new_header = new_mc->mpb;
> > >> +
> > >> +list_for_each_entry(ucode_patch, _cache, list)
> > >> +{
> > >> +struct microcode_amd *old_mc = ucode_patch->data;
> > >> +struct microcode_header_amd *old_header = old_mc->mpb;
> > >> +
> > >> +if ( new_header->processor_rev_id == 
> > >> old_header->processor_rev_id )
> > >> +{
> > >> +if ( new_header->patch_id <= old_header->patch_id )
> > >> +return -1;
> > >> +list_replace(_patch->list, _patch->list);
> > >> +free_ucode_patch(ucode_patch);
> > >> +return 0;
> > >> +}
> > >> +}
> > >
> > >This could be made common code with a specific hook for AMD and Intel
> > >in order to do the comparison, so that at least the loop over the
> > >list of ucode entries could be shared.
> > 
> > Something like pt_pirq_iterate()? Will give it a try.
> 
> Yes, that might also be helpful. I was thinking of adding such a
> comparison hook in microcode_ops, also having something like
> pt_pirq_iterate will be helpful if you need to iterate over the cache
> in other functions.
> 
> > >> @@ -491,6 +559,21 @@ static int cpu_request_microcode(unsigned int cpu, 
> > >> const void *buf,
> > >>  while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
> > >> )) == 0 )
> > >>  {
> > >> +struct ucode_patch *ucode_patch;
> > >> +
> > >> +/*
> > >> + * Save this microcode before checking the signature. It is to
> > >> + * optimize microcode update on a mixed family system. Parsing
> > >
> > >Er, is it possible to have a system with CPUs of different family?
> > >What's going to happen with CPUs having different features?
> > 
> > I have no idea. That each cpu has a per-cpu variable to store the
> > microcode rather than a global one gives me a feeling that the current
> > implementation wants to make it work on a system with CPUs of different
> > family.
> 
> I think we need AMD maintainers input on this one. TBH I very much
> doubt there are (working) systems out there with mixed family CPUs.
> 
> Thanks, Roger.

Sorry about the delay.  From the PPR for F17 M00-0FH
"AMD Family 17h processors with different OPNs or different revisions
cannot be mixed in a multiprocessor system. If an unsupported
configuration is detected, BIOS should configure the BSP as a single
processor system and signal an error."

Even mixing OPNs within a model is a no go for us.  Mixing different
families is something I highly doubt will ever be supported.

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] AMD IOMMU: fix debug console IOMMU intremap output

2018-12-04 Thread Woods, Brian

When using the Xen debug console and printing the IOMMU intremap tables,
it prints everything in the IVRS range regardless if it has an intr
remap or not.  Add some logic to cause an entry to only be printed if
the intr remap table isn't empty.

Signed-off-by: Brian Woods 
---
CC: Paul Durrant 
CC: Roger Pau Monne 

 xen/drivers/passthrough/amd/iommu_intr.c | 34 ++--
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
b/xen/drivers/passthrough/amd/iommu_intr.c
index dad2d1e5ab..e86300b57f 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -665,6 +665,24 @@ int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
 return rc;
 }
 
+
+static bool intremap_table_empty(const u32 *table)
+{
+u32 count;
+
+if ( !table )
+return true;
+
+for ( count = 0; count < INTREMAP_ENTRIES; count++ )
+{
+if ( table[count] )
+return false;
+}
+return true;
+}
+
+
+
 static void dump_intremap_table(const u32 *table)
 {
 u32 count;
@@ -687,13 +705,17 @@ static int dump_intremap_mapping(u16 seg, struct 
ivrs_mappings *ivrs_mapping)
 if ( !ivrs_mapping )
 return 0;
 
-printk("  %04x:%02x:%02x:%u:\n", seg,
-   PCI_BUS(ivrs_mapping->dte_requestor_id),
-   PCI_SLOT(ivrs_mapping->dte_requestor_id),
-   PCI_FUNC(ivrs_mapping->dte_requestor_id));
-
 spin_lock_irqsave(&(ivrs_mapping->intremap_lock), flags);
-dump_intremap_table(ivrs_mapping->intremap_table);
+
+if ( !intremap_table_empty(ivrs_mapping->intremap_table) ) {
+printk("  %04x:%02x:%02x:%u:\n", seg,
+   PCI_BUS(ivrs_mapping->dte_requestor_id),
+   PCI_SLOT(ivrs_mapping->dte_requestor_id),
+   PCI_FUNC(ivrs_mapping->dte_requestor_id));
+
+dump_intremap_table(ivrs_mapping->intremap_table);
+}
+
 spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
 
 return 0;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 8/9] x86/amd: Virtualise MSR_VIRT_SPEC_CTRL for guests

2018-12-04 Thread Woods, Brian

On Mon, Dec 03, 2018 at 04:18:21PM +, Andy Cooper wrote:
> The semantics of MSR_VIRT_SPEC_CTRL are that unknown bits are write-discard
> and read as zero.  Only VIRT_SPEC_CTRL.SSBD is defined at the moment.
> 
> To facilitate making this per-guest, the legacy SSBD state needs context
> switching between vcpus.  amd_ctxt_switch_legacy_ssbd() is updated to take the
> vcpus setting into account.  Furthermore, the guests chosen value needs
> preserving across migrate.
> 
> This marks a subtle change in how `ssbd=` behaves.  If Xen wishes SSBD to be
> asserted, it remains set in hardware all the time.  In the default case of Xen
> wishing SSBD not to be asserted, the value set in hardware is the guests
> choice.

Ok, we talked about this some over IRC, but I thought it would be
better to get some more eyes on this on the mailing list.

From what some engineers have said over here, it takes roughly 400
clock cycles for enabling SSBD via LS_CFG.  It isn't cheap, no, but if
the average VPCU time is 30ms, that's roughly .66%* overhead worst case
(non-turbo'd on the slowest freq processor at max speed [2GHz]).

The other thing I don't get is why advertise virtualized SSBD when the
guest setting it does nothing?  If ssbd_opt=true is set, as the code is
now, why even advertise it to the guest?  I'd suggest either allowing
the guest to turn it off or not advertise it at all (when ssbd_opt =
true).

* Twrmsr (in ms) = (400/200)*1000,
  percent = (Twrmsr/30ms) * 100  = .66(repeating)%

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 7/9] x86/amd: Support context switching legacy SSBD interface

2018-12-04 Thread Woods, Brian

On Mon, Dec 03, 2018 at 04:18:20PM +, Andy Cooper wrote:
> It is critical that MSR_AMD64_LS_CFG is never modified outside of this
> function, to avoid trampling on sibling settings.
> 
> For now, pass in NULL from the boot paths and just set Xen's default.  Later
> patches will plumb in guest choices.  This now supercedes the older code which
> wrote to MSR_AMD64_LS_CFG once during boot.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 6/9] x86/amd: Allocate resources to cope with LS_CFG being per-core on Fam17h

2018-12-04 Thread Woods, Brian

On Mon, Dec 03, 2018 at 04:18:19PM +, Andy Cooper wrote:
> The need for per-core resources is a property of Fam17h hardware.  The
> mechanim for calculating / allocating space is all a bit horrible, but is the
> best which can be done at this point.  See the code comments for details.
> 
> Signed-off-by: Andrew Cooper 
> Parts based on an earlier patch by Brian
> Signed-off-by: Signed-off-by: Brian Woods 

Needs to be:
Signed-off-by: Brian Woods  

Otherwise,
Reviewed-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/9] x86/amd: Probe for legacy SSBD interfaces on boot

2018-12-04 Thread Woods, Brian

On Mon, Dec 03, 2018 at 04:18:18PM +, Andy Cooper wrote:
> Introduce a new synthetic LEGACY_SSBD feature and set it if we find
> VIRT_SPEC_CTRL offered by our hypervisor, or if we find a working bit in an
> LS_CFG register.
> 
> Signed-off-by: Andrew Cooper 

Reviewd-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/9] x86/amd: Introduce CPUID/MSR definitions for per-vcpu SSBD support

2018-12-04 Thread Woods, Brian

On Mon, Dec 03, 2018 at 04:18:17PM +, Andy Cooper wrote:
> At the time of writing, the spec is available from:
> 
>   
> https://developer.amd.com/wp-content/resources/124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
> 
> Future hardware (Zen v2) is expect to have hardware MSR_SPEC_CTRL support,
> including SPEC_CTRL.SSBD, and with the expectation that this will be directly
> passed through to guests for performance.
> 
> On currently released hardware, the only mechanism available is the legacy
> LS_CFG option, and this is very expensive to use.  Furthermore, emulating
> MSR_SPEC_CTRL via interception is prohibitively expensive, as certain OSes use
> the write-discard flexibility to simplify their entry/exit logic.
> 
> As an alternative, MSR_VIRT_SPEC_CTRL is specified as an architectural control
> (with semantics equivilent to MSR_SPEC_CTRL) which is provided by the
> hypervisor.  This abstracts away the model-specific details of the LS_CFG
> mechanism, which allows migration safety to be retained.
> 
> Signed-off-by: Andrew Cooper 

Reviewd-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/hvm: Corrections to RDTSCP intercept handling

2018-11-30 Thread Woods, Brian

On Fri, Nov 30, 2018 at 05:07:20PM +, Andy Cooper wrote:
> For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline
> supports the instruction, but the guest may have not have rdtscp in its
> featureset.  Bring the vmexit handlers in line with the main emulator
> behaviour by optionally handing back #UD.
> 
> Next on the AMD side, if RDTSCP actually ends up being intercepted on a debug
> build, we first update regs->rcx, then call __get_instruction_length() asking
> for RDTSC.  As the two instructions are different (and indeed, different
> lengths!), __get_instruction_length_from_list() fails and hands back a #GP
> fault.
> 
> This can demonstrated by putting a guest into tsc_mode="always emulate" and
> executing an rdtscp instruction:
> 
>   (d1) --- Xen Test Framework ---
>   (d1) Environment: HVM 64bit (Long mode 4 levels)
>   (d1) Test rdtscp
>   (d1) TSC mode 1
>   (XEN) emulate.c:159:d1v0 __get_instruction_length_from_list: Mismatch 
> between expected and actual instruction:
>   (XEN) emulate.c:163:d1v0   list[0] val 8, { opc 0xf0031, modrm 0 }, list 
> entries: 1
>   (XEN) emulate.c:165:d1v0   rip 0x10475f, nextrip 0x104762, len 3
>   (XEN) Insn_len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 
> 5b 31 ff 31 c0 e9 c4 db ff ff 00 00 00
>   (d1) **
>   (d1) PANIC: Unhandled exception at 0008:0010475f
>   (d1) Vec 13 #GP[]
>   (d1) **
> 
> First, teach __get_instruction_length() to cope with RDTSCP, and improve
> svm_vmexit_do_rdtsc() to ask for the correct instruction.  Move the regs->rcx
> adjustment into this function to ensure it gets done after we are done
> potentially raising faults.
> 
> Reported-by: Paul Durrant 
> Signed-off-by: Andrew Cooper 

As far as the SVM part:
Reviewed-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/svm: Improve diagnostics when __get_instruction_length_from_list() fails

2018-11-30 Thread Woods, Brian

On Fri, Nov 30, 2018 at 05:07:19PM +, Andy Cooper wrote:
> Sadly, a lone:
> 
>   (XEN) emulate.c:156:d2v0 __get_instruction_length_from_list: Mismatch 
> between expected and actual instruction: eip = f804564139c0
> 
> on the console is of no use trying to identify what went wrong.  Dump as much
> state as we can to help identify what went wrong.
> 
> Reported-by: Paul Durrant 
> Signed-off-by: Andrew Cooper 

Acked-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 1/2] amd/iommu: assign iommu devices to Xen

2018-11-29 Thread Woods, Brian

On Tue, Nov 27, 2018 at 04:24:40PM +0100, Roger Pau Monne wrote:
> AMD IOMMU devices are exposed on the PCI bus, and thus are assigned by
> default to the hardware domain. This can cause issues because the
> IOMMU devices themselves are not behind an IOMMU, so update_paging_mode will
> return an error if Xen tries to expand the page tables of a domain
> that has assigned devices not behind an IOMMU. update_paging_mode
> failing will cause the domain to be destroyed.
> 
> Fix this by hiding PCI IOMMU devices, so they are not assigned to the
> hardware domain.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Jan Beulich 

Acked-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

1 2 >

1 - 100 of 117 matches

Mail list logo