On Wed, Oct 07, 2020 at 01:06:08PM +0100, Andrew Cooper wrote:
> On 06/10/2020 17:23, Roger Pau Monne wrote:
> > Currently a PV hardware domain can also be given control over the CPU
> > frequency, and such guest is allowed to write to MSR_IA32_PERF_CTL.
> 
> This might be how the current logic "works", but its straight up broken.
> 
> PERF_CTL is thread scope, so unless dom0 is identity pinned and has one
> vcpu for every pcpu, it cannot use the interface correctly.

Selecting cpufreq=dom0-kernel will force vCPU pinning. I'm not able
however to see anywhere that would force dom0 vCPUs == pCPUs.

> > However since commit 322ec7c89f6 the default behavior has been changed
> > to reject accesses to not explicitly handled MSRs, preventing PV
> > guests that manage CPU frequency from reading
> > MSR_IA32_PERF_{STATUS/CTL}.
> >
> > Additionally some HVM guests (Windows at least) will attempt to read
> > MSR_IA32_PERF_CTL and will panic if given back a #GP fault:
> >
> > vmx.c:3035:d8v0 RDMSR 0x00000199 unimplemented
> > d8v0 VIRIDIAN CRASH: 3b c0000096 fffff806871c1651 ffffda0253683720 0
> >
> > Move the handling of MSR_IA32_PERF_{STATUS/CTL} to the common MSR
> > handling shared between HVM and PV guests, and add an explicit case
> > for reads to MSR_IA32_PERF_{STATUS/CTL}.
> 
> OTOH, PERF_CTL does have a seemingly architectural "please disable turbo
> for me" bit, which is supposed to be for calibration loops.  I wonder if
> anyone uses this, and whether we ought to honour it (probably not).

If we let guests play with this we would have to save/restore the
guest value on context switch. Unless there's a strong case for this,
I would say no.

> > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > index d8ed83f869..41baa3b7a1 100644
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -1069,6 +1069,12 @@ extern enum cpufreq_controller {
> >      FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen
> >  } cpufreq_controller;
> >  
> > +static inline bool is_cpufreq_controller(const struct domain *d)
> > +{
> > +    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
> > +            is_hardware_domain(d));
> 
> This won't compile on !CONFIG_X86, due to CONFIG_HAS_CPUFREQ

It does seem to build on Arm, because this is only used in x86 code:

https://gitlab.com/xen-project/people/royger/xen/-/jobs/778207412

The extern declaration of cpufreq_controller is just above, so if you
tried to use is_cpufreq_controller on Arm you would get a link time
error, otherwise it builds fine. The compiler removes the function on
Arm as it has the inline attribute and it's not used.

Alternatively I could look into moving cpufreq_controller (and
is_cpufreq_controller) out of sched.h into somewhere else, I haven't
looked at why it needs to live there.

> Honestly - I don't see any point to this code.  Its opt-in via the
> command line only, and doesn't provide adequate checks for enablement. 
> (It's not as if we're lacking complexity or moving parts when it comes
> to power/frequency management).

Right, I could do a pre-patch to remove this, but I also don't think
we should block this fix on removing FREQCTL_dom0_kernel, so I would
rather fix the regression and then remove the feature if we agree it
can be removed.

Thanks, Roger.

Reply via email to