When a page table ends up with all contiguous entries (including all identical attributes), it can be replaced by a superpage entry at the next higher level. The page table itself can then be scheduled for freeing.
Signed-off-by: Jan Beulich <jbeul...@suse.com> --- Unlike the freeing of all-empty page tables, this causes quite a bit of back and forth for PV domains, due to their mapping/unmapping of pages when they get converted to/from being page tables. It may therefore be worth considering to delay re-coalescing a little, to avoid doing so when the superpage would otherwise get split again pretty soon. But I think this would better be the subject of a separate change anyway. Of course this could also be helped by more "aware" kernel side behavior: They could avoid immediately mapping freed page tables writable again, in anticipation of re-using that same page for another page table elsewhere. --- v3: New. --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -81,7 +81,8 @@ static union amd_iommu_pte set_iommu_pte unsigned long dfn, unsigned long next_mfn, unsigned int level, - bool iw, bool ir) + bool iw, bool ir, + bool *contig) { union amd_iommu_pte *table, *pde, old; @@ -94,11 +95,15 @@ static union amd_iommu_pte set_iommu_pte old.iw != iw || old.ir != ir ) { set_iommu_pde_present(pde, next_mfn, 0, iw, ir); - pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level), - level, PTE_kind_leaf); + *contig = pt_update_contig_markers(&table->raw, + pfn_to_pde_idx(dfn, level), + level, PTE_kind_leaf); } else + { old.pr = false; /* signal "no change" to the caller */ + *contig = false; + } unmap_domain_page(table); @@ -346,6 +351,7 @@ int amd_iommu_map_page(struct domain *d, { struct domain_iommu *hd = dom_iommu(d); unsigned int level = (IOMMUF_order(flags) / PTE_PER_TABLE_SHIFT) + 1; + bool contig; int rc; unsigned long pt_mfn = 0; union amd_iommu_pte old; @@ -386,8 +392,26 @@ int amd_iommu_map_page(struct domain *d, /* Install mapping */ old = set_iommu_pte_present(pt_mfn, dfn_x(dfn), mfn_x(mfn), level, - (flags & IOMMUF_writable), - (flags & IOMMUF_readable)); + flags & IOMMUF_writable, + flags & IOMMUF_readable, &contig); + + while ( unlikely(contig) && ++level < hd->arch.amd.paging_mode ) + { + struct page_info *pg = mfn_to_page(_mfn(pt_mfn)); + unsigned long next_mfn; + + if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags, + false) ) + BUG(); + BUG_ON(!pt_mfn); + + next_mfn = mfn_x(mfn) & (~0UL << (PTE_PER_TABLE_SHIFT * (level - 1))); + set_iommu_pte_present(pt_mfn, dfn_x(dfn), next_mfn, level, + flags & IOMMUF_writable, + flags & IOMMUF_readable, &contig); + *flush_flags |= IOMMU_FLUSHF_modified | IOMMU_FLUSHF_all; + iommu_queue_free_pgtable(d, pg); + } spin_unlock(&hd->arch.mapping_lock);