On Thu, Nov 10, 2022 at 10:47:07PM +0000, Andrew Cooper wrote:
> On 04/11/2022 16:18, Roger Pau Monne wrote:
> > The current reporting of the hardware assisted APIC options is done by
> > checking "virtualize APIC accesses" which is not very helpful, as that
> > feature doesn't avoid a vmexit, instead it does provide some help in
> > order to detect APIC MMIO accesses in vmexit processing.
> >
> > Repurpose the current reporting of xAPIC assistance to instead report
> > such feature as present when there's support for "TPR shadow" and
> > "APIC register virtualization" because in that case some xAPIC MMIO
> > register accesses are handled directly by the hardware, without
> > requiring a vmexit.
> >
> > For symetry also change assisted x2APIC reporting to require
> > "virtualize x2APIC mode" and "APIC register virtualization", dropping
> > the option to also be reported when "virtual interrupt delivery" is
> > available.  Presence of the "virtual interrupt delivery" feature will
> > be reported using a different option.
> >
> > Fixes: 2ce11ce249 ('x86/HVM: allow per-domain usage of hardware virtualized 
> > APIC')
> > Signed-off-by: Roger Pau Monné <roger....@citrix.com>
> > ---
> > I find the logic in vmx_vlapic_msr_changed() hard to follow, but I
> > don't want to rewrite the function logic at this point.
> > ---
> > Changes since v1:
> >  - Fix Viridian MSR tip conditions.
> > ---
> >  xen/arch/x86/hvm/viridian/viridian.c |  2 +-
> >  xen/arch/x86/hvm/vmx/vmcs.c          |  8 ++++----
> >  xen/arch/x86/hvm/vmx/vmx.c           | 25 ++++++++++++++++++-------
> >  xen/arch/x86/traps.c                 |  4 +---
> >  4 files changed, 24 insertions(+), 15 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
> > b/xen/arch/x86/hvm/viridian/viridian.c
> > index 25dca93e8b..44eb3d0519 100644
> > --- a/xen/arch/x86/hvm/viridian/viridian.c
> > +++ b/xen/arch/x86/hvm/viridian/viridian.c
> > @@ -197,7 +197,7 @@ void cpuid_viridian_leaves(const struct vcpu *v, 
> > uint32_t leaf,
> >          res->a = CPUID4A_RELAX_TIMER_INT;
> >          if ( viridian_feature_mask(d) & HVMPV_hcall_remote_tlb_flush )
> >              res->a |= CPUID4A_HCALL_REMOTE_TLB_FLUSH;
> > -        if ( !cpu_has_vmx_apic_reg_virt )
> > +        if ( !has_assisted_xapic(d) )
> >              res->a |= CPUID4A_MSR_BASED_APIC;
> 
> This check is broken before and after.  It needs to be keyed on
> virtualised interrupt delivery, not register acceleration.
> 
> But doing this correctly needs a per-domain vintr setting, which we
> don't have yet.
> 
> It is marginally less broken with this change, than without, but that's
> not saying much.

I took a look at the HyperV TLFS spec but the table that lists the
CPUID bits don't explicitly name which registers are accessed using
MSRs rather than MMIO when the 'MSR_BASED_APIC' suggestion is set on
CPUID.

It's my understanding that the hint is not useful anymore, as Xen
exposes x2APIC by default, and that's what any new-ish version of
Windows should use, in which case all APIC registers are already
accessed using MSRs and the hint is moot.

> >          if ( viridian_feature_mask(d) & HVMPV_hcall_ipi )
> >              res->a |= CPUID4A_SYNTHETIC_CLUSTER_IPI;
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index a1aca1ec04..7bb96e1a8e 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -1136,7 +1136,7 @@ static int construct_vmcs(struct vcpu *v)
> >  
> >      if ( !has_assisted_xapic(d) )
> >          v->arch.hvm.vmx.secondary_exec_control &=
> > -            ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> > +            ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
> >  
> >      if ( cpu_has_vmx_secondary_exec_control )
> >          __vmwrite(SECONDARY_VM_EXEC_CONTROL,
> > @@ -2156,10 +2156,10 @@ int __init vmx_vmcs_init(void)
> >      if ( !ret )
> >      {
> >          /* Check whether hardware supports accelerated xapic and x2apic. */
> > -        assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses;
> > +        assisted_xapic_available = cpu_has_vmx_tpr_shadow &&
> > +                                   cpu_has_vmx_apic_reg_virt;
> >          assisted_x2apic_available = cpu_has_vmx_virtualize_x2apic_mode &&
> > -                                    (cpu_has_vmx_apic_reg_virt ||
> > -                                     cpu_has_vmx_virtual_intr_delivery);
> > +                                    cpu_has_vmx_apic_reg_virt;
> 
> apic reg virt already depends on tpr shadow, so that part of the
> condition is redundant.
> 
> virtualise x2apic mode and apic reg virt aren't dependent, but they do
> only ever appear together in hardware.

I would keep those as-is for sanity, it's easier to spot the
dependencies between them.  And then it's also possible that we want
to introduce a control for tpr_shadow, and which point having the
conditional here is a good reminder that virtualize_apic_accesses
depends on that feature being available and enabled.

> Keeping the conditionals might be ok to combat a bad outer hypervisor,
> but ...
> 
> >          register_keyhandler('v', vmcs_dump, "dump VT-x VMCSs", 1);
> >      }
> >  
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index e624b415c9..bf0fe3355c 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -3405,25 +3405,29 @@ static void vmx_install_vlapic_mapping(struct vcpu 
> > *v)
> >  
> >  void vmx_vlapic_msr_changed(struct vcpu *v)
> >  {
> > +    bool virtualize_x2apic_mode = has_assisted_x2apic(v->domain) ||
> > +                                  (cpu_has_vmx_virtualize_x2apic_mode &&
> > +                                   cpu_has_vmx_virtual_intr_delivery);
> 
> ... this is still wrong, and ...
> 
> >      struct vlapic *vlapic = vcpu_vlapic(v);
> >      unsigned int msr;
> >  
> > -    if ( !has_assisted_xapic(v->domain) &&
> > -         !has_assisted_x2apic(v->domain) )
> > +    if ( !cpu_has_vmx_virtualize_apic_accesses &&
> > +         !virtualize_x2apic_mode )
> >          return;
> 
> ... this surely cannot be right.

It's indeed missing a has_assisted_xapic(v->domain), so it should be:

if ( !cpu_has_vmx_virtualize_apic_accesses &&
     !has_assisted_xapic(v->domain) &&
     !virtualize_x2apic_mode )
    return;

Logic in this function is not the best one IMO, as I've already
mentioned in a post-commit remark.

> 
> While attempting to figure ^ out, I've found yet another regression vs
> 4.16.  Because virt intr delivery is set in the execution controls
> system-wide and not controlled per domain, we'll take a vmentry failure
> on SKX/CLX/ICX when trying to build an HVM domain without xAPIC
> acceleration.

I've tried creating the following guest on one of our Ice Lake boxes
(with and without this patch applied):

type="pvh"
name = "test"
memory = 1024
vcpus = 1

kernel = "/root/vmlinuz-6.1.0-rc4+"
extra = "console=hvc0"

assisted_xapic=0

And it seem s to boot just fine, no vmentry failure.

> This, combined with the ABI errors (/misfeatures) that we really don't
> want to escape into the world but I haven't finished fixing yet, means
> that the only appropriate course of action is to revert.

Hm, I'm confused by this, as it is still not clear to me what's
wrong with the ABI.  Is it the way in which we expose the flags?  Or
is it the naming?

AFAIK we want to have a flag to toggle apic_reg_virt, which is the
proposal here.  Should it be named differently?

> I'd really hoped to avoid a full revert, but we've run out of time.

I've also hoped so.

Thanks, Roger.

Reply via email to