> On 25 Oct 2023, at 21:59, Andrew Cooper <andrew.coop...@citrix.com> wrote:
>
> On 25/10/2023 8:29 pm, Edwin Török wrote:
>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
>> index 483b5e4f70..b3cd851d9d 100644
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -584,6 +584,13 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t
>> val)
>> }
>> break;
>>
>> + case MSR_K8_HWCR:
>> + if ( !(cp->x86_vendor & X86_VENDOR_AMD) ||
>> + (val & ~K8_HWCR_IRPERF_EN) ||
>> + wrmsr_safe(msr, val) != 0 )
>> + goto gp_fault;
>> + break;
>> +
>> case MSR_AMD64_DE_CFG:
>> /*
>> * OpenBSD 6.7 will panic if writing to DE_CFG triggers a #GP:
>> diff --git a/xen/include/public/arch-x86/cpufeatureset.h
>> b/xen/include/public/arch-x86/cpufeatureset.h
>> index 5faca0bf7a..40f74cd5e8 100644
>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -241,7 +241,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /* APERF/MPERF
>> Read Only interface */
>>
>> /* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
>> XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */
>> -XEN_CPUFEATURE(IRPERF, 8*32+ 1) /* Instruction Retired Performance
>> Counter */
>> +XEN_CPUFEATURE(IRPERF, 8*32+ 1) /*A! Instruction Retired Performance
>> Counter */
>> XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A (F)X{SAVE,RSTOR} always
>> saves/restores FPU Error pointers */
>> XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /* WBNOINVD instruction */
>> XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used
>> by AMD) */
>
> Last things first. You can advertise this bit to guests, but I'm
> looking at the AMD manuals and woefully little information on what this
> is an how it works.
>
> It does have an architectural enumeration, but there isn't even a a
> description of what HWCR.IPERF_EN does.
>
> The HWCR documentation says "enables Core::X86::Msr::IRPerfCount", but
> the IRPERF MSR says no such think,
> noting only that EFRO_LOCK makes the
> MSR non-writeable (which in turn means we can't always offer it to
> guests anyway. See below).
> At a guess, the HWCR bit activates the counter, but nothing states
> this.
My version
(https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf)
says:
"This is a dedicated counter that is always counting instructions retired. It
exists at MSR address C000_00E9. It is enabled by setting a 1 to HWCR[30] and
its support is indicated by CPUID Fn8000_0008_EBX[1]."
Although digging a bit more there are some erratas around deep C states on some
chips and the HWCR register itself is not global but per CCD (which is neatly
buried in the _ccd[7:0]_.... smallprint at the top:
https://github.com/cyring/CoreFreq/issues/206#issuecomment-722713147
> At a guess, it's a free-running, reset-able counter, but again
> nothing states this. The manual just says "is a dedicated counter" and
> doesn't even state whether it is (or is not) tied into the other core
> fixed counters and whether it is integrated with PMI or not.
There is also a LWP 'Instructions Retired' described in 13.4.2.2, which is
user-mode only and per-core, but I assume that is completely different from
this MSR
and part of the features that got dropped in newer cores.
>
> Which brings us on to the middle hunk. Putting it there short circuits
> the PV-specific handling, but you can't do that in general without
> arranging for HWCR to be context switched properly for vCPUs, nor in
> this case IPERF_COUNT itself.
>
> Unless you context switch HWCR and IPERF_COUNT, a guest will see it not
> counting, or the count going backwards, 50% of the time as the vCPU is
> scheduled around.
>
> So while in principle we could let guests use this facility (it would
> have to be off by default, because there doesn't appear to be any
> filtering capability in it at all, so we can't restrict it to
> instructions retired in guest context), it will need a far larger patch
> than this to do it safely.
Thanks, I'll move this patch into the 'needs more work' category.
Thanks,
--Edwin