HVM: limit cache writeback overhead

Jan Beulich Thu, 20 Apr 2023 02:42:38 -0700

On 19.04.2023 23:55, Andrew Cooper wrote:
> On 19/04/2023 11:46 am, Jan Beulich wrote:
>> There's no need to write back caches on all CPUs upon seeing a WBINVD
>> exit; ones that a vCPU hasn't run on since the last writeback (or since
>> it was started) can't hold data which may need writing back.
>>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> 
> I find it unlikely that this is an improvement in any way at all.
> 
> You're adding a memory allocation, and making the fastpath slower, for
> all HVM domains even the ~100% of them in practice which never get given
> a device in the first place.
> 
> Just so you can skip the WBINVD side effect on the L1/L2 caches of the
> CPUs this domain happens not to have run on since the last time they
> flushed (which is already an under estimate).  Note how this does not
> change the behaviour for the L3 caches, which form the overwhelming
> majority of the WBINVD overhead in the first place.


You're thinking of only single-node systems here, it seems? I view this
change as particularly relevant for node-constrained domains on NUMA
systems.

As to "making the fastpath slower": That can only be the
__cpumask_set_cpu() added to hvm_do_resume(). What do you suggest? I
can certainly add a conditional there (and then I could also make the
memory allocation conditional), but I thought a LOCK-less RMW memory
operation might better be done in straight line code.

As an aside - after having thought of making this change, I did go and
check what KVM does. And (surprise or not) they do exactly this.

> So my response was going to be "definitely not without the per-domain
> 'reduced cacheability permitted' setting we've discussed".  And even
> then, not without numbers suggesting it's a problem in the first place,
> or at least a better explanation of why it might plausibly be an issue.
> 
> 
> But, in writing this, I've realised a real bug.
> 
> Cache snoops can occur and pull lines sideways for microarchitectural
> reasons.  And even if we want to hand-wave that away as being unlikely
> (it is), you can't hand-wave away rogue speculation in the directmap.
> 
> By not issuing WBINVD on all cores, you've got a real chance of letting
> some lines escape the attempt to evict them.
> 
> But it's worse than that - even IPIing all cores, there's a speculation
> pattern which can cause some lines to survive.  Rare, sure, but not
> impossible.
> 
> Right now, I'm not sure that WBINVD can even be used safely without the
> extra careful use of CR0.{CD,NW}, which provides a workaround for
> native, but nothing helpful for hypervisors...

Wait: See the title and the earlier patches. We're not talking of "evict"
here anymore, but of "write-back". The few cases of "evict" are left alone
by this change. If any of those are affected by what you say (and it looks
like some might be), then fixing that definitely is unrelated work. (You
may have meant that latter part of your reply like this, but you haven't
said so in any way I would recognize.)

Jan

Re: [PATCH 5/5] x86/HVM: limit cache writeback overhead

Reply via email to