On Fri, Sep 24, 2021 at 11:52:14AM +0200, Jan Beulich wrote: > No separate feature flags exist which would control availability of > these; the only restriction is HATS (establishing the maximum number of > page table levels in general), and even that has a lower bound of 4. > Thus we can unconditionally announce 2M, 1G, and 512G mappings. (Via > non-default page sizes the implementation in principle permits arbitrary > size mappings, but these require multiple identical leaf PTEs to be > written, which isn't all that different from having to write multiple > consecutive PTEs with increasing frame numbers. IMO that's therefore > beneficial only on hardware where suitable TLBs exist; I'm unaware of > such hardware.)
Also replacing/shattering such non-standard page sizes will require more logic, so unless there's a performance benefit I would just skip using them. > > Signed-off-by: Jan Beulich <jbeul...@suse.com> > --- > I'm not fully sure about allowing 512G mappings: The scheduling-for- > freeing of intermediate page tables can take quite a while when > replacing a tree of 4k mappings by a single 512G one. Plus (or otoh) > there's no present code path via which 512G chunks of memory could be > allocated (and hence mapped) anyway. I would limit to 1G, which is what we support for CPU page tables also. > --- a/xen/drivers/passthrough/amd/iommu_map.c > +++ b/xen/drivers/passthrough/amd/iommu_map.c > @@ -32,12 +32,13 @@ static unsigned int pfn_to_pde_idx(unsig > } > > static union amd_iommu_pte clear_iommu_pte_present(unsigned long l1_mfn, > - unsigned long dfn) > + unsigned long dfn, > + unsigned int level) > { > union amd_iommu_pte *table, *pte, old; > > table = map_domain_page(_mfn(l1_mfn)); > - pte = &table[pfn_to_pde_idx(dfn, 1)]; > + pte = &table[pfn_to_pde_idx(dfn, level)]; > old = *pte; > > write_atomic(&pte->raw, 0); > @@ -288,10 +289,31 @@ static int iommu_pde_from_dfn(struct dom > return 0; > } > > +static void queue_free_pt(struct domain *d, mfn_t mfn, unsigned int > next_level) Nit: should the last parameter be named level rather than next_level? AFAICT it's the level of the mfn parameter. Should we also assert that level (or next_level) is always != 0 for extra safety? Thanks, Roger.