On 01.12.2021 15:39, Andrew Cooper wrote:
> On 01/12/2021 09:40, Jan Beulich wrote:
>> The actual function should always have lived in core x86 code; move it
>> there, replacing get_cache_line_size() by readily available (except very
>> early during boot; see the code comment) data.
>>
>> Drop the respective IOMMU hook, (re)introducing a respective boolean
>> instead. Replace a true and an almost open-coding instance of
>> iommu_sync_cache().
> 
> Coherency (or not) is a per-IOMMU property, not a global platform
> property.  iGPU IOMMUs are very different to those the uncore, and
> there's no reason to presume that IOMMUs in the PCH would have the same
> properties as those in the uncore.
> 
> Given how expensive sync_cache() is, we cannot afford to be using it for
> coherent IOMMUs in a mixed system.
> 
> This wants to be a boolean in arch_iommu.

That's a valid consideration, but may not be as easy as it may seem on
the surface. Certainly not something I could promise to find time for
soon. And definitely separate from the specific change here.

>> ---
>> Placing the function next to flush_area_local() exposes a curious
>> asymmetry between the SFENCE placements: sync_cache() has it after the
>> flush, while flush_area_local() has it before it. I think the latter one
>> is misplaced.
> 
> Wow this is a mess.
> 
> First and foremost, AMD state that on pre-CLFLUSHOPT parts, CLFLUSH is
> unordered with ~anything (including SFENCE), and need bounding with
> MFENCE on both sides.  We definitely aren't doing this correctly right now.
> 
> 
> AMD explicitly states that SFENCE drains the store and WC buffers (i.e.
> make stuff instantaneously globally visible).  Intel does not, and
> merely guarantees ordering.
> 
> A leading SFENCE would only make sense if there were WC concerns, but
> both manuals say that the memory type doesn't matter, so I can't see a
> justification for it.
> 
> What does matter, from the IOMMU's point of view, is that the line has
> been written back (or evicted on pre-CLWB parts) before the IOTLB
> invalidation occurs.  The invalidation will be a write to a different
> address, which is why the trailing SFENCE is necessary, as CLFLUSHOPT
> isn't ordered with respect to unaliasing writes.

IOW for the purposes of this change all is correct, and everything else
will require taking care of separately.

> On a more minor note, both Intel and AMD say that CLFLUSH* are permitted
> to target an execute-only code segment, so we need a fix in
> hvmemul_cache_op()'s use of hvmemul_virtual_to_linear(...,
> hvm_access_read, ...) which will currently #GP in this case.

Hmm, this specific case would seem to require to simply use hvm_access_none
(like hvmemul_tlb_op() already does), except for CLWB.

But then hvm_vcpu_virtual_to_linear() also doesn't look to handle
hvm_access_insn_fetch correctly for data segments. Changing of which would
in turn require to first audit all use sites, to make sure we don't break
anything.

I'll see about doing both.

>> --- a/xen/arch/x86/flushtlb.c
>> +++ b/xen/arch/x86/flushtlb.c
>> @@ -11,6 +11,7 @@
>>  #include <xen/sched.h>
>>  #include <xen/smp.h>
>>  #include <xen/softirq.h>
>> +#include <asm/cache.h>
>>  #include <asm/flushtlb.h>
>>  #include <asm/invpcid.h>
>>  #include <asm/nops.h>
>> @@ -265,6 +266,57 @@ unsigned int flush_area_local(const void
>>      return flags;
>>  }
>>  
>> +void sync_cache(const void *addr, unsigned int size)
> 
> Can we name this cache_writeback()?  sync is very generic, and it really
> doesn't want confusing cache_flush().

Oh, sure, can do. As long as you don't insist on also changing
iommu_sync_cache(): While purely mechanical, this would bloat the
patch by quite a bit.

Bottom line: This last item is the only change to the actual patch;
everything else will require further work yielding separate patches.

Jan


Reply via email to