Re: [PATCH v3 07/14 RESEND] cpufreq: Export HWP parameters to userspace

Jason Andryuk Thu, 11 May 2023 06:50:15 -0700

On Thu, May 11, 2023 at 2:21 AM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 10.05.2023 19:49, Jason Andryuk wrote:
> > On Mon, May 8, 2023 at 6:26 AM Jan Beulich <jbeul...@suse.com> wrote:
> >>
> >> On 01.05.2023 21:30, Jason Andryuk wrote:
> >>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
> >>> hardware rather closely.
> >>>
> >>> We need the features bitmask to indicated fields supported by the actual
> >>> hardware.
> >>>
> >>> The use of uint8_t parameters matches the hardware size.  uint32_t
> >>> entries grows the sysctl_t past the build assertion in setup.c.  The
> >>> uint8_t ranges are supported across multiple generations, so hopefully
> >>> they won't change.
> >>
> >> Still it feels a little odd for values to be this narrow. Aiui the
> >> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
> >> used by HWP. So you could widen the union in struct
> >> xen_get_cpufreq_para (in a binary but not necessarily source compatible
> >> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
> >> placed scaling_cur_freq could be included as well ...
> >
> > The values are narrow, but they match the hardware.  It works for HWP,
> > so there is no need to change at this time AFAICT.
> >
> > Do you want me to make this change?
>
> Well, much depends on what these 8-bit values actually express (I did
> raise this question in one of the replies to your patches, as I wasn't
> able to find anything in the SDM). That'll then hopefully allow to
> make some educated prediction on on how likely it is that a future
> variant of hwp would want to widen them.

Sorry for not providing a reference earlier. In the SDM,
HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this
second paragraph:
"""
In contrast, HWP is an implementation of the ACPI-defined
Collaborative Processor Performance Control (CPPC), which specifies
that the platform enumerates a continuous, abstract unit-less,
performance value scale that is not tied to a specific performance
state / frequency by definition. While the enumerated scale is roughly
linear in terms of a delivered integer workload performance result,
the OS is required to characterize the performance value range to
comprehend the delivered performance for an applied workload.
"""

The numbers are "continuous, abstract unit-less, performance value."
So there isn't much to go on there, but generally, smaller numbers
mean slower and bigger numbers mean faster.

Cross referencing the ACPI spec here:
https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control

Scrolling down you can find the register entries such as

Highest Performance
Register or DWORD Attribute: Read
Size: 8-32 bits

AMD has its own pstate implementation that is similar to HWP. Looking
at the Linux support, the AMD hardware also use 8 bit values for the
comparable fields:
https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612

So Intel and AMD are 8bit for now at least. Something could do 32bits
according to the ACPI spec.

8 bits of granularity for slow to fast seems like plenty to me. I'm
not sure what one would gain from 16 or 32 bits, but I'm not designing
the hardware. From the earlier xenpm output, "highest" was 49, so
still a decent amount of room in an 8 bit range.

> (Was it energy_perf that went
> from 4 to 8 bits at some point, which you even comment upon in the
> public header?)

energy_perf (Energy_Performanc_Preference) had a fallback: "If
CPUID.06H:EAX[bit 10] indicates that this field is not supported, HWP
uses the value of the IA32_ENERGY_PERF_BIAS MSR to determine the
energy efficiency / performance preference." So it had a different
range, but that was because it was being put into an older register.

However, I've removed that fallback code in v4.

Regards,
Jason

Re: [PATCH v3 07/14 RESEND] cpufreq: Export HWP parameters to userspace

Reply via email to