On 01/12/2021 09:40, Jan Beulich wrote: > The actual function should always have lived in core x86 code; move it > there, replacing get_cache_line_size() by readily available (except very > early during boot; see the code comment) data. > > Drop the respective IOMMU hook, (re)introducing a respective boolean > instead. Replace a true and an almost open-coding instance of > iommu_sync_cache().
Coherency (or not) is a per-IOMMU property, not a global platform property. iGPU IOMMUs are very different to those the uncore, and there's no reason to presume that IOMMUs in the PCH would have the same properties as those in the uncore. Given how expensive sync_cache() is, we cannot afford to be using it for coherent IOMMUs in a mixed system. This wants to be a boolean in arch_iommu. > Signed-off-by: Jan Beulich <jbeul...@suse.com> > --- > Placing the function next to flush_area_local() exposes a curious > asymmetry between the SFENCE placements: sync_cache() has it after the > flush, while flush_area_local() has it before it. I think the latter one > is misplaced. Wow this is a mess. First and foremost, AMD state that on pre-CLFLUSHOPT parts, CLFLUSH is unordered with ~anything (including SFENCE), and need bounding with MFENCE on both sides. We definitely aren't doing this correctly right now. AMD explicitly states that SFENCE drains the store and WC buffers (i.e. make stuff instantaneously globally visible). Intel does not, and merely guarantees ordering. A leading SFENCE would only make sense if there were WC concerns, but both manuals say that the memory type doesn't matter, so I can't see a justification for it. What does matter, from the IOMMU's point of view, is that the line has been written back (or evicted on pre-CLWB parts) before the IOTLB invalidation occurs. The invalidation will be a write to a different address, which is why the trailing SFENCE is necessary, as CLFLUSHOPT isn't ordered with respect to unaliasing writes. On a more minor note, both Intel and AMD say that CLFLUSH* are permitted to target an execute-only code segment, so we need a fix in hvmemul_cache_op()'s use of hvmemul_virtual_to_linear(..., hvm_access_read, ...) which will currently #GP in this case. > > --- a/xen/arch/x86/flushtlb.c > +++ b/xen/arch/x86/flushtlb.c > @@ -11,6 +11,7 @@ > #include <xen/sched.h> > #include <xen/smp.h> > #include <xen/softirq.h> > +#include <asm/cache.h> > #include <asm/flushtlb.h> > #include <asm/invpcid.h> > #include <asm/nops.h> > @@ -265,6 +266,57 @@ unsigned int flush_area_local(const void > return flags; > } > > +void sync_cache(const void *addr, unsigned int size) Can we name this cache_writeback()? sync is very generic, and it really doesn't want confusing cache_flush(). ~Andrew