Re: [PATCH 2/6] x86/iommu: add common page-table allocator

2020-07-27 Thread Jan Beulich

On 27.07.2020 11:37, Durrant, Paul wrote:

From: Andrew Cooper 
Sent: 24 July 2020 19:24

On 24/07/2020 17:46, Paul Durrant wrote:

--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)

  spin_lock_init(>arch.mapping_lock);

+INIT_PAGE_LIST_HEAD(>arch.pgtables.list);
+spin_lock_init(>arch.pgtables.lock);
+
  return 0;
  }

  void arch_iommu_domain_destroy(struct domain *d)
  {
+struct domain_iommu *hd = dom_iommu(d);
+struct page_info *pg;
+
+while ( (pg = page_list_remove_head(>arch.pgtables.list)) )
+free_domheap_page(pg);


Some of those 90 lines saved were the logic to not suffer a watchdog
timeout here.

To do it like this, it needs plumbing into the relinquish resources path.



Ok. I does look like there could be other potentially lengthy destruction done 
off the back of the RCU call. Ought we have the ability to have a restartable 
domain_destroy()?


I don't see how this would be (easily) feasible. Instead - why do
page tables not get cleaned up already at relinquish_resources time?

Jan



RE: [PATCH 2/6] x86/iommu: add common page-table allocator

2020-07-27 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 24 July 2020 19:24
> To: Paul Durrant ; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul ; Jan Beulich ; 
> Kevin Tian
> ; Wei Liu ; Roger Pau Monné 
> 
> Subject: RE: [EXTERNAL] [PATCH 2/6] x86/iommu: add common page-table allocator
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24/07/2020 17:46, Paul Durrant wrote:
> > From: Paul Durrant 
> >
> > Instead of having separate page table allocation functions in VT-d and AMD
> > IOMMU code, use a common allocation function in the general x86 code.
> > Also, rather than walking the page tables and using a tasklet to free them
> > during iommu_domain_destroy(), add allocated page table pages to a list and
> > then simply walk the list to free them. This saves ~90 lines of code 
> > overall.
> >
> > NOTE: There is no need to clear and sync PTEs during teardown since the per-
> >   device root entries will have already been cleared (when devices were
> >   de-assigned) so the page tables can no longer be accessed by the 
> > IOMMU.
> >
> > Signed-off-by: Paul Durrant 
> 
> Oh wow - I don't have any polite words for how that code was organised
> before.
> 
> Instead of discussing the ~90 lines saved, what about the removal of a
> global bottleneck (the tasklet) or expand on the removal of redundant
> TLB/cache maintenance?
> 

Ok.

> > diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > index c386dc4387..fd9b1e7bd5 100644
> > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, 
> > u8 devfn,
> >  return reassign_device(pdev->domain, d, devfn, pdev);
> >  }
> >
> > -static void deallocate_next_page_table(struct page_info *pg, int level)
> > -{
> > -PFN_ORDER(pg) = level;
> > -spin_lock(_pt_cleanup_lock);
> > -page_list_add_tail(pg, _pt_cleanup_list);
> > -spin_unlock(_pt_cleanup_lock);
> > -}
> > -
> > -static void deallocate_page_table(struct page_info *pg)
> > -{
> > -struct amd_iommu_pte *table_vaddr;
> > -unsigned int index, level = PFN_ORDER(pg);
> > -
> > -PFN_ORDER(pg) = 0;
> > -
> > -if ( level <= 1 )
> > -{
> > -free_amd_iommu_pgtable(pg);
> > -return;
> > -}
> > -
> > -table_vaddr = __map_domain_page(pg);
> > -
> > -for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
> > -{
> > -struct amd_iommu_pte *pde = _vaddr[index];
> > -
> > -if ( pde->mfn && pde->next_level && pde->pr )
> > -{
> > -/* We do not support skip levels yet */
> > -ASSERT(pde->next_level == level - 1);
> > -deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
> > -   pde->next_level);
> > -}
> > -}
> > -
> > -unmap_domain_page(table_vaddr);
> > -free_amd_iommu_pgtable(pg);
> > -}
> > -
> > -static void deallocate_iommu_page_tables(struct domain *d)
> > -{
> > -struct domain_iommu *hd = dom_iommu(d);
> > -
> > -spin_lock(>arch.mapping_lock);
> > -if ( hd->arch.amd_iommu.root_table )
> > -{
> > -deallocate_next_page_table(hd->arch.amd_iommu.root_table,
> > -   hd->arch.amd_iommu.paging_mode);
> 
> I really need to dust off my patch fixing up several bits of dubious
> logic, including the name "paging_mode" which is actually simply the
> number of levels.
> 
> At this point, it will probably be best to get this series in first, and
> for me to rebase.
> 

Ok.

> > -hd->arch.amd_iommu.root_table = NULL;
> > -}
> > -spin_unlock(>arch.mapping_lock);
> > -}
> > -
> > -
> >  static void amd_iommu_domain_destroy(struct domain *d)
> >  {
> > -deallocate_iommu_page_tables(d);
> > +dom_iommu(d)->arch.amd_iommu.root_table = NULL;
> >  amd_iommu_flush_all_pages(d);
> 
> Per your NOTE:, shouldn't this flush call be dropped?
> 

Indeed it should.

> > diff --git a/xen/drivers/passthrough/x86/iommu.c 
> > b/xen/drivers/passthrough/x86/iommu.c
> > index a12109a1de..b3c7da0fe2 100644
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
> >
> >  spin_lock_init(>arch.mapping_lock);
> >
> > +INIT_PAGE_LIST_HEAD(>arch.pgtables.list);
> > +spin_lock_init(>arch.pgtables.lock);
> > +
> >  return 0;
> >  }
> >
> >  void arch_iommu_domain_destroy(struct domain *d)
> >  {
> > +struct domain_iommu *hd = dom_iommu(d);
> > +struct page_info *pg;
> > +
> > +while ( (pg = page_list_remove_head(>arch.pgtables.list)) )
> > +free_domheap_page(pg);
> 
> Some of those 90 lines saved were the logic to not suffer a 

Re: [PATCH 2/6] x86/iommu: add common page-table allocator

2020-07-26 Thread Jan Beulich
On 24.07.2020 20:24, Andrew Cooper wrote:
> On 24/07/2020 17:46, Paul Durrant wrote:
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
>>  
>>  spin_lock_init(>arch.mapping_lock);
>>  
>> +INIT_PAGE_LIST_HEAD(>arch.pgtables.list);
>> +spin_lock_init(>arch.pgtables.lock);
>> +
>>  return 0;
>>  }
>>  
>>  void arch_iommu_domain_destroy(struct domain *d)
>>  {
>> +struct domain_iommu *hd = dom_iommu(d);
>> +struct page_info *pg;
>> +
>> +while ( (pg = page_list_remove_head(>arch.pgtables.list)) )
>> +free_domheap_page(pg);
> 
> Some of those 90 lines saved were the logic to not suffer a watchdog
> timeout here.
> 
> To do it like this, it needs plumbing into the relinquish resources path.

And indeed this is possible now only because we don't destroy page
tables for still running domains anymore. Maybe the description
should also make this connection.

Jan



Re: [PATCH 2/6] x86/iommu: add common page-table allocator

2020-07-24 Thread Andrew Cooper
On 24/07/2020 17:46, Paul Durrant wrote:
> From: Paul Durrant 
>
> Instead of having separate page table allocation functions in VT-d and AMD
> IOMMU code, use a common allocation function in the general x86 code.
> Also, rather than walking the page tables and using a tasklet to free them
> during iommu_domain_destroy(), add allocated page table pages to a list and
> then simply walk the list to free them. This saves ~90 lines of code overall.
>
> NOTE: There is no need to clear and sync PTEs during teardown since the per-
>   device root entries will have already been cleared (when devices were
>   de-assigned) so the page tables can no longer be accessed by the IOMMU.
>
> Signed-off-by: Paul Durrant 

Oh wow - I don't have any polite words for how that code was organised
before.

Instead of discussing the ~90 lines saved, what about the removal of a
global bottleneck (the tasklet) or expand on the removal of redundant
TLB/cache maintenance?

> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index c386dc4387..fd9b1e7bd5 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, u8 
> devfn,
>  return reassign_device(pdev->domain, d, devfn, pdev);
>  }
>  
> -static void deallocate_next_page_table(struct page_info *pg, int level)
> -{
> -PFN_ORDER(pg) = level;
> -spin_lock(_pt_cleanup_lock);
> -page_list_add_tail(pg, _pt_cleanup_list);
> -spin_unlock(_pt_cleanup_lock);
> -}
> -
> -static void deallocate_page_table(struct page_info *pg)
> -{
> -struct amd_iommu_pte *table_vaddr;
> -unsigned int index, level = PFN_ORDER(pg);
> -
> -PFN_ORDER(pg) = 0;
> -
> -if ( level <= 1 )
> -{
> -free_amd_iommu_pgtable(pg);
> -return;
> -}
> -
> -table_vaddr = __map_domain_page(pg);
> -
> -for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
> -{
> -struct amd_iommu_pte *pde = _vaddr[index];
> -
> -if ( pde->mfn && pde->next_level && pde->pr )
> -{
> -/* We do not support skip levels yet */
> -ASSERT(pde->next_level == level - 1);
> -deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
> -   pde->next_level);
> -}
> -}
> -
> -unmap_domain_page(table_vaddr);
> -free_amd_iommu_pgtable(pg);
> -}
> -
> -static void deallocate_iommu_page_tables(struct domain *d)
> -{
> -struct domain_iommu *hd = dom_iommu(d);
> -
> -spin_lock(>arch.mapping_lock);
> -if ( hd->arch.amd_iommu.root_table )
> -{
> -deallocate_next_page_table(hd->arch.amd_iommu.root_table,
> -   hd->arch.amd_iommu.paging_mode);

I really need to dust off my patch fixing up several bits of dubious
logic, including the name "paging_mode" which is actually simply the
number of levels.

At this point, it will probably be best to get this series in first, and
for me to rebase.

> -hd->arch.amd_iommu.root_table = NULL;
> -}
> -spin_unlock(>arch.mapping_lock);
> -}
> -
> -
>  static void amd_iommu_domain_destroy(struct domain *d)
>  {
> -deallocate_iommu_page_tables(d);
> +dom_iommu(d)->arch.amd_iommu.root_table = NULL;
>  amd_iommu_flush_all_pages(d);

Per your NOTE:, shouldn't this flush call be dropped?

> diff --git a/xen/drivers/passthrough/x86/iommu.c 
> b/xen/drivers/passthrough/x86/iommu.c
> index a12109a1de..b3c7da0fe2 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
>  
>  spin_lock_init(>arch.mapping_lock);
>  
> +INIT_PAGE_LIST_HEAD(>arch.pgtables.list);
> +spin_lock_init(>arch.pgtables.lock);
> +
>  return 0;
>  }
>  
>  void arch_iommu_domain_destroy(struct domain *d)
>  {
> +struct domain_iommu *hd = dom_iommu(d);
> +struct page_info *pg;
> +
> +while ( (pg = page_list_remove_head(>arch.pgtables.list)) )
> +free_domheap_page(pg);

Some of those 90 lines saved were the logic to not suffer a watchdog
timeout here.

To do it like this, it needs plumbing into the relinquish resources path.

>  }
>  
>  static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> @@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>  return;
>  }
>  
> +struct page_info *iommu_alloc_pgtable(struct domain *d)
> +{
> +struct domain_iommu *hd = dom_iommu(d);
> +#ifdef CONFIG_NUMA
> +unsigned int memflags = (hd->node == NUMA_NO_NODE) ?
> +0 : MEMF_node(hd->node);
> +#else
> +unsigned int memflags = 0;
> +#endif
> +struct page_info *pg;
> +void *p;

The memflags code is very awkward.  How about initialising it to 0, and
having:

#ifdef CONFIG_NUMA
    if ( hd->node != NUMA_NO_NODE )
        memflags =