On 01/12/2021 09:40, Jan Beulich wrote:
> The actual function should always have lived in core x86 code; move it
> there, replacing get_cache_line_size() by readily available (except very
> early during boot; see the code comment) data.
>
> Drop the respective IOMMU hook, (re)introducing a respective boolean
> instead. Replace a true and an almost open-coding instance of
> iommu_sync_cache().

Coherency (or not) is a per-IOMMU property, not a global platform
property.  iGPU IOMMUs are very different to those the uncore, and
there's no reason to presume that IOMMUs in the PCH would have the same
properties as those in the uncore.

Given how expensive sync_cache() is, we cannot afford to be using it for
coherent IOMMUs in a mixed system.

This wants to be a boolean in arch_iommu.


> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> ---
> Placing the function next to flush_area_local() exposes a curious
> asymmetry between the SFENCE placements: sync_cache() has it after the
> flush, while flush_area_local() has it before it. I think the latter one
> is misplaced.

Wow this is a mess.

First and foremost, AMD state that on pre-CLFLUSHOPT parts, CLFLUSH is
unordered with ~anything (including SFENCE), and need bounding with
MFENCE on both sides.  We definitely aren't doing this correctly right now.


AMD explicitly states that SFENCE drains the store and WC buffers (i.e.
make stuff instantaneously globally visible).  Intel does not, and
merely guarantees ordering.

A leading SFENCE would only make sense if there were WC concerns, but
both manuals say that the memory type doesn't matter, so I can't see a
justification for it.

What does matter, from the IOMMU's point of view, is that the line has
been written back (or evicted on pre-CLWB parts) before the IOTLB
invalidation occurs.  The invalidation will be a write to a different
address, which is why the trailing SFENCE is necessary, as CLFLUSHOPT
isn't ordered with respect to unaliasing writes.


On a more minor note, both Intel and AMD say that CLFLUSH* are permitted
to target an execute-only code segment, so we need a fix in
hvmemul_cache_op()'s use of hvmemul_virtual_to_linear(...,
hvm_access_read, ...) which will currently #GP in this case.

>
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -11,6 +11,7 @@
>  #include <xen/sched.h>
>  #include <xen/smp.h>
>  #include <xen/softirq.h>
> +#include <asm/cache.h>
>  #include <asm/flushtlb.h>
>  #include <asm/invpcid.h>
>  #include <asm/nops.h>
> @@ -265,6 +266,57 @@ unsigned int flush_area_local(const void
>      return flags;
>  }
>  
> +void sync_cache(const void *addr, unsigned int size)

Can we name this cache_writeback()?  sync is very generic, and it really
doesn't want confusing cache_flush().

~Andrew

Reply via email to