Re: [PATCH 2/6] x86/iommu: add common page-table allocator
On 27.07.2020 11:37, Durrant, Paul wrote: From: Andrew Cooper Sent: 24 July 2020 19:24 On 24/07/2020 17:46, Paul Durrant wrote: --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d) spin_lock_init(>arch.mapping_lock); +INIT_PAGE_LIST_HEAD(>arch.pgtables.list); +spin_lock_init(>arch.pgtables.lock); + return 0; } void arch_iommu_domain_destroy(struct domain *d) { +struct domain_iommu *hd = dom_iommu(d); +struct page_info *pg; + +while ( (pg = page_list_remove_head(>arch.pgtables.list)) ) +free_domheap_page(pg); Some of those 90 lines saved were the logic to not suffer a watchdog timeout here. To do it like this, it needs plumbing into the relinquish resources path. Ok. I does look like there could be other potentially lengthy destruction done off the back of the RCU call. Ought we have the ability to have a restartable domain_destroy()? I don't see how this would be (easily) feasible. Instead - why do page tables not get cleaned up already at relinquish_resources time? Jan
RE: [PATCH 2/6] x86/iommu: add common page-table allocator
> -Original Message- > From: Andrew Cooper > Sent: 24 July 2020 19:24 > To: Paul Durrant ; xen-devel@lists.xenproject.org > Cc: Durrant, Paul ; Jan Beulich ; > Kevin Tian > ; Wei Liu ; Roger Pau Monné > > Subject: RE: [EXTERNAL] [PATCH 2/6] x86/iommu: add common page-table allocator > > CAUTION: This email originated from outside of the organization. Do not click > links or open > attachments unless you can confirm the sender and know the content is safe. > > > > On 24/07/2020 17:46, Paul Durrant wrote: > > From: Paul Durrant > > > > Instead of having separate page table allocation functions in VT-d and AMD > > IOMMU code, use a common allocation function in the general x86 code. > > Also, rather than walking the page tables and using a tasklet to free them > > during iommu_domain_destroy(), add allocated page table pages to a list and > > then simply walk the list to free them. This saves ~90 lines of code > > overall. > > > > NOTE: There is no need to clear and sync PTEs during teardown since the per- > > device root entries will have already been cleared (when devices were > > de-assigned) so the page tables can no longer be accessed by the > > IOMMU. > > > > Signed-off-by: Paul Durrant > > Oh wow - I don't have any polite words for how that code was organised > before. > > Instead of discussing the ~90 lines saved, what about the removal of a > global bottleneck (the tasklet) or expand on the removal of redundant > TLB/cache maintenance? > Ok. > > diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c > b/xen/drivers/passthrough/amd/pci_amd_iommu.c > > index c386dc4387..fd9b1e7bd5 100644 > > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > > @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, > > u8 devfn, > > return reassign_device(pdev->domain, d, devfn, pdev); > > } > > > > -static void deallocate_next_page_table(struct page_info *pg, int level) > > -{ > > -PFN_ORDER(pg) = level; > > -spin_lock(_pt_cleanup_lock); > > -page_list_add_tail(pg, _pt_cleanup_list); > > -spin_unlock(_pt_cleanup_lock); > > -} > > - > > -static void deallocate_page_table(struct page_info *pg) > > -{ > > -struct amd_iommu_pte *table_vaddr; > > -unsigned int index, level = PFN_ORDER(pg); > > - > > -PFN_ORDER(pg) = 0; > > - > > -if ( level <= 1 ) > > -{ > > -free_amd_iommu_pgtable(pg); > > -return; > > -} > > - > > -table_vaddr = __map_domain_page(pg); > > - > > -for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ ) > > -{ > > -struct amd_iommu_pte *pde = _vaddr[index]; > > - > > -if ( pde->mfn && pde->next_level && pde->pr ) > > -{ > > -/* We do not support skip levels yet */ > > -ASSERT(pde->next_level == level - 1); > > -deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)), > > - pde->next_level); > > -} > > -} > > - > > -unmap_domain_page(table_vaddr); > > -free_amd_iommu_pgtable(pg); > > -} > > - > > -static void deallocate_iommu_page_tables(struct domain *d) > > -{ > > -struct domain_iommu *hd = dom_iommu(d); > > - > > -spin_lock(>arch.mapping_lock); > > -if ( hd->arch.amd_iommu.root_table ) > > -{ > > -deallocate_next_page_table(hd->arch.amd_iommu.root_table, > > - hd->arch.amd_iommu.paging_mode); > > I really need to dust off my patch fixing up several bits of dubious > logic, including the name "paging_mode" which is actually simply the > number of levels. > > At this point, it will probably be best to get this series in first, and > for me to rebase. > Ok. > > -hd->arch.amd_iommu.root_table = NULL; > > -} > > -spin_unlock(>arch.mapping_lock); > > -} > > - > > - > > static void amd_iommu_domain_destroy(struct domain *d) > > { > > -deallocate_iommu_page_tables(d); > > +dom_iommu(d)->arch.amd_iommu.root_table = NULL; > > amd_iommu_flush_all_pages(d); > > Per your NOTE:, shouldn't this flush call be dropped? > Indeed it should. > > diff --git a/xen/drivers/passthrough/x86/iommu.c > > b/xen/drivers/passthrough/x86/iommu.c > > index a12109a1de..b3c7da0fe2 100644 > > --- a/xen/drivers/passthrough/x86/iommu.c > > +++ b/xen/drivers/passthrough/x86/iommu.c > > @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d) > > > > spin_lock_init(>arch.mapping_lock); > > > > +INIT_PAGE_LIST_HEAD(>arch.pgtables.list); > > +spin_lock_init(>arch.pgtables.lock); > > + > > return 0; > > } > > > > void arch_iommu_domain_destroy(struct domain *d) > > { > > +struct domain_iommu *hd = dom_iommu(d); > > +struct page_info *pg; > > + > > +while ( (pg = page_list_remove_head(>arch.pgtables.list)) ) > > +free_domheap_page(pg); > > Some of those 90 lines saved were the logic to not suffer a
Re: [PATCH 2/6] x86/iommu: add common page-table allocator
On 24.07.2020 20:24, Andrew Cooper wrote: > On 24/07/2020 17:46, Paul Durrant wrote: >> --- a/xen/drivers/passthrough/x86/iommu.c >> +++ b/xen/drivers/passthrough/x86/iommu.c >> @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d) >> >> spin_lock_init(>arch.mapping_lock); >> >> +INIT_PAGE_LIST_HEAD(>arch.pgtables.list); >> +spin_lock_init(>arch.pgtables.lock); >> + >> return 0; >> } >> >> void arch_iommu_domain_destroy(struct domain *d) >> { >> +struct domain_iommu *hd = dom_iommu(d); >> +struct page_info *pg; >> + >> +while ( (pg = page_list_remove_head(>arch.pgtables.list)) ) >> +free_domheap_page(pg); > > Some of those 90 lines saved were the logic to not suffer a watchdog > timeout here. > > To do it like this, it needs plumbing into the relinquish resources path. And indeed this is possible now only because we don't destroy page tables for still running domains anymore. Maybe the description should also make this connection. Jan
Re: [PATCH 2/6] x86/iommu: add common page-table allocator
On 24/07/2020 17:46, Paul Durrant wrote: > From: Paul Durrant > > Instead of having separate page table allocation functions in VT-d and AMD > IOMMU code, use a common allocation function in the general x86 code. > Also, rather than walking the page tables and using a tasklet to free them > during iommu_domain_destroy(), add allocated page table pages to a list and > then simply walk the list to free them. This saves ~90 lines of code overall. > > NOTE: There is no need to clear and sync PTEs during teardown since the per- > device root entries will have already been cleared (when devices were > de-assigned) so the page tables can no longer be accessed by the IOMMU. > > Signed-off-by: Paul Durrant Oh wow - I don't have any polite words for how that code was organised before. Instead of discussing the ~90 lines saved, what about the removal of a global bottleneck (the tasklet) or expand on the removal of redundant TLB/cache maintenance? > diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c > b/xen/drivers/passthrough/amd/pci_amd_iommu.c > index c386dc4387..fd9b1e7bd5 100644 > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, u8 > devfn, > return reassign_device(pdev->domain, d, devfn, pdev); > } > > -static void deallocate_next_page_table(struct page_info *pg, int level) > -{ > -PFN_ORDER(pg) = level; > -spin_lock(_pt_cleanup_lock); > -page_list_add_tail(pg, _pt_cleanup_list); > -spin_unlock(_pt_cleanup_lock); > -} > - > -static void deallocate_page_table(struct page_info *pg) > -{ > -struct amd_iommu_pte *table_vaddr; > -unsigned int index, level = PFN_ORDER(pg); > - > -PFN_ORDER(pg) = 0; > - > -if ( level <= 1 ) > -{ > -free_amd_iommu_pgtable(pg); > -return; > -} > - > -table_vaddr = __map_domain_page(pg); > - > -for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ ) > -{ > -struct amd_iommu_pte *pde = _vaddr[index]; > - > -if ( pde->mfn && pde->next_level && pde->pr ) > -{ > -/* We do not support skip levels yet */ > -ASSERT(pde->next_level == level - 1); > -deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)), > - pde->next_level); > -} > -} > - > -unmap_domain_page(table_vaddr); > -free_amd_iommu_pgtable(pg); > -} > - > -static void deallocate_iommu_page_tables(struct domain *d) > -{ > -struct domain_iommu *hd = dom_iommu(d); > - > -spin_lock(>arch.mapping_lock); > -if ( hd->arch.amd_iommu.root_table ) > -{ > -deallocate_next_page_table(hd->arch.amd_iommu.root_table, > - hd->arch.amd_iommu.paging_mode); I really need to dust off my patch fixing up several bits of dubious logic, including the name "paging_mode" which is actually simply the number of levels. At this point, it will probably be best to get this series in first, and for me to rebase. > -hd->arch.amd_iommu.root_table = NULL; > -} > -spin_unlock(>arch.mapping_lock); > -} > - > - > static void amd_iommu_domain_destroy(struct domain *d) > { > -deallocate_iommu_page_tables(d); > +dom_iommu(d)->arch.amd_iommu.root_table = NULL; > amd_iommu_flush_all_pages(d); Per your NOTE:, shouldn't this flush call be dropped? > diff --git a/xen/drivers/passthrough/x86/iommu.c > b/xen/drivers/passthrough/x86/iommu.c > index a12109a1de..b3c7da0fe2 100644 > --- a/xen/drivers/passthrough/x86/iommu.c > +++ b/xen/drivers/passthrough/x86/iommu.c > @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d) > > spin_lock_init(>arch.mapping_lock); > > +INIT_PAGE_LIST_HEAD(>arch.pgtables.list); > +spin_lock_init(>arch.pgtables.lock); > + > return 0; > } > > void arch_iommu_domain_destroy(struct domain *d) > { > +struct domain_iommu *hd = dom_iommu(d); > +struct page_info *pg; > + > +while ( (pg = page_list_remove_head(>arch.pgtables.list)) ) > +free_domheap_page(pg); Some of those 90 lines saved were the logic to not suffer a watchdog timeout here. To do it like this, it needs plumbing into the relinquish resources path. > } > > static bool __hwdom_init hwdom_iommu_map(const struct domain *d, > @@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) > return; > } > > +struct page_info *iommu_alloc_pgtable(struct domain *d) > +{ > +struct domain_iommu *hd = dom_iommu(d); > +#ifdef CONFIG_NUMA > +unsigned int memflags = (hd->node == NUMA_NO_NODE) ? > +0 : MEMF_node(hd->node); > +#else > +unsigned int memflags = 0; > +#endif > +struct page_info *pg; > +void *p; The memflags code is very awkward. How about initialising it to 0, and having: #ifdef CONFIG_NUMA if ( hd->node != NUMA_NO_NODE ) memflags =