Re: [Xen-devel] Xen 4.14 and future work

Andrew Cooper Tue, 03 Dec 2019 09:38:53 -0800

On 03/12/2019 09:03, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-boun...@lists.xenproject.org> On Behalf Of
>> Andrew Cooper
>> Sent: 02 December 2019 19:52
>> To: Xen-devel List <xen-de...@lists.xen.org>
>> Subject: [Xen-devel] Xen 4.14 and future work
>>
>> Hello,
>>
>> Now that 4.13 is on its way out of the door, it is time to look to
>> ongoing work.
>>
>> We have a large backlog of speculation-related work.  For one, we still
>> don't virtualise MSR_ARCH_CAPS for guests, or use eIBRS ourselves in
>> Xen.  Therefore, while Xen does function on Cascade Lake, support is
>> distinctly suboptimal.
>>
>> Similarly, AMD systems frequently fill /var/log with:
>>
>> (XEN) emul-priv-op.c:1113:d0v13 Domain attempted WRMSR c0011020 from
>> 0x0006404000000000 to 0x0006404000000400
>>
>> which is an interaction Linux's prctl() to disable memory disambiguation
>> on a per-process basis, Xen's write/discard behaviour for MSRs, and the
>> long-overdue series to properly virtualise SSBD support on AMD
>> hardware.  AMD Rome hardware, like Cascade Lake, has certain hardware
>> speculative mitigation features which need virtualising for guests to
>> make use of.
>>
> I assume this would addressed by the proposed cpuid/msr policy work?


Yes.  The next task there is to plumb the CPUID policy through the libxc
migrate stream, coping with its absence from older sources.  This
(purposefully) breaks the dual purpose of the CPUID code in libxc for
both domain start and domain restore, and allows us to rewrite the
domain start logic without impacting migrating-in VMs.

Then, and only then, is it safe to add MSR_ARCH_CAPS into the guest
policies and start setting it up.

> I think it is quite vital for Xen that we are able to migrate guests across 
> pools of heterogeneous h/w and therefore I'd like to see this done in 4.14 if 
> possible.

Why do you think it was top of my list :)

>
>> Similarly, there is plenty more work to do with core-aware scheduling,
>> and from my side of things, sane guest topology.  This will eventually
>> unblock one of the factors on the hard 128 vcpu limit for HVM guests.
>>
>>
>> Another big area is the stability of toolstack hypercalls.  This is a
>> crippling pain point for distros and upgradeability of systems, and
>> there is frankly no justifiable reason for the way we currently do
>> things  The real reason is inertia from back in the days when Xen.git
>> (bitkeeper as it was back then) contained a fork of every relevant
>> pieces of software, but this a long-since obsolete model, but still
>> causing us pain.  I will follow up with a proposal in due course, but as
>> a oneliner, it will build on the dm_op() API model.
> This is also fairly vital for the work on live update of Xen (as discussed at 
> the last dev summit). Any instability in the tools ABI will compromise 
> hypervisor update and fixing such issues on an ad-hoc basis as they arise is 
> not really a desirable prospect.
>
>> Likely included within this is making the domain/vcpu destroy paths
>> idempotent so we can fix a load of NULL pointer dereferences in Xen
>> caused by XEN_DOMCTL_max_vcpus not being part of XEN_DOMCTL_createdomain.
>>
>> Other work in this area involves adding X86_EMUL_{VIRIDIAN,NESTED_VIRT}
>> to replace their existing problematic enablement interfaces.
>>
> I think this should include deprecation of HVMOP_get/set_param as far as is 
> possible (i.e. tools use)...
>
>> A start needs to be made on a total rethink of the HVM ABI.  This has
>> come up repeatedly at previous dev summits, and is in desperate need of
>> having some work started on it.
>>
> ...and completely in any new ABI.

Both already in the plan(s).

> I wonder to what extent we can provide a guest-side compat layer here, 
> otherwise it would be hard to get traction I think.

Step 1 of the design (deliberately) won't be concerned with guest
compatibility.  The single most important aspect is to come up with a
clean design which is not crippled by retaining compatibility for PV
guests, and without x86-isms leaking into other architectures.

Once a sensible design exists, we can go about figuring out how best to
enact it.  Most areas will be able to fit compatibility into existing
HVM guests, but some are going to have a very hard time.

> There was an interesting talk at KVM Forum (https://sched.co/Tmuy) on dealing 
> with emulation inside guest context by essentially re-injecting the VMEXITs 
> back into the guest for pseudo-SMM code (loaded as part of the firmware blob) 
> to deal with. I could imagine potentially using such a mechanism to have a 
> 'legacy' hypercall translated to the new ABI, which would allow older guests 
> to be supported unmodified (albeit with a performance penalty). Such a 
> mechanism may also be useful as an alternative way of dealing with some of 
> the emulation dealt with directly in Xen at the moment, to reduce the 
> hypervisor attack surface e.g. stdvga caching, hpet, rtc... perhaps.

I don't think this is relevant to the ABI discussion - its not changing
anything in guest view.  I'm sure people will want it for other reasons,
and I don't see any issue with implementing it for existing HVM guests.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.14 and future work

Reply via email to