On 15.11.2021 15:33, Andrew Cooper wrote: > On 15/11/2021 10:53, Jan Beulich wrote: >> On 12.11.2021 19:51, Andrew Cooper wrote: >>> On 10/11/2021 09:19, Jane Malalane wrote: >>>> Before, user would change turbo status but this had no effect: boolean >>>> was set but policy wasn't reevaluated. Policy must be reevaluated so >>>> that CPU frequency is chosen according to the turbo status and the >>>> current governor. >>>> >>>> Therefore, add __cpufreq_governor() in cpufreq_update_turbo(). >>>> >>>> Reported-by: <edvin.to...@citrix.com> >>>> Signed-off-by: <jane.malal...@citrix.com> >>>> --- >>>> CC: Jan Beulich <jbeul...@suse.com> >>>> CC: Ian Jackson <i...@xenproject.org> >>>> --- >>>> >>>> Release rationale: >>>> Not taking this patch means that turbo status is misleading. >>>> >>>> Taking this patch is low-risk as it only uses a function that already >>>> exists and is already used for setting the chosen scaling governor. >>>> Essentially, this change is equivalent to running 'xenpm >>>> en/disable-turbo-mode' and, subsequently, running 'xenpm >>>> set-scaling-governor <current governor>'. >>>> --- >>>> xen/drivers/cpufreq/utility.c | 6 +++++- >>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c >>>> index b93895d4dd..5f200ff3ee 100644 >>>> --- a/xen/drivers/cpufreq/utility.c >>>> +++ b/xen/drivers/cpufreq/utility.c >>>> @@ -417,10 +417,14 @@ int cpufreq_update_turbo(int cpuid, int new_state) >>>> { >>>> ret = cpufreq_driver.update(cpuid, policy); >>>> if (ret) >>>> + { >>>> policy->turbo = curr_state; >>>> + return ret; >>>> + } >>>> } >>>> >>>> - return ret; >>>> + /* Reevaluate current CPU policy. */ >>>> + return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); >>>> } >>> So, having looked through the manual, what the cpufreq_driver does for >>> turbo on Intel is bogus according to the SDM. >>> >>> There is a non-architectrual dance with IA32_MISC_ENABLE bit 38 (per >>> package) for firmware to configure turbo, but it manifests as another >>> dynamic CPUID bit (which I think we handle correctly). It has the same >>> semantics as XD_DISABLE and CPUID_LIMIT so we might want to consider >>> adding it to the set of bits we clear during boot. >> This is quite confusing in the docs - at least one of the tables calls >> the bit "IDA Disable", while other entries at least also refer to the >> effect of disabling IDA. I'm afraid I have trouble connecting turbo >> mode and IDA disabling, unless both are two different names of the >> same thing. Maybe they really are, except that SDM vol 2 uses yet >> another slightly different term for CPUID[6].EAX[1]: "Intel Turbo Boost >> Technology". > > SDM Vol3 14.3 uses IDA and turbo interchangeably in some cases. It > reads as if IDA is the general technology name, and turbo is a sub-mode > for "doing it automatically without software involvement". > > On CPUs which support IDA, the CPUID bit is !MISC_ENABLE[38], with > further adds to the confusion of which is which. > >>> However, the correct way to turn off turbo is to set >>> IA32_PERF_CTL.TURBO_DISENGAGE bit, which is per logical processor. As >>> such, it *should* behave far more like the AMD CPB path. >> I'm afraid public documentation has no mention of a bit of this name. >> Considering the above I wonder whether you mean "IDA engage" (bit 32), >> albeit this doesn't seem very likely when you're taking about a >> "disengage" bit. > > It is that bit. Despite the name given in Vol4 Table 2-12, it is "set > to disable". > > Vol3 Figure 14-2 explicitly calls it the "IDA/Turbo disengage" bit and > the surrounding text makes it clear that it is disable bit, not an > enable bit. Also, that's how the Linux intel_pstate driver uses it.
Okay, then I agree with the proposal you've made. Jan >> If it is, we'd still need to cope with it being >> unavailable: While as per the doc it exists from Merom onwards, i.e. >> just far enough back for all 64-bit capable processors to be covered, >> at least there it is attributed "Mobile only". > > Honestly, the number of errors in those tables are alarming. The MSR is > has been around longer than turbo, and I'm told that the interface has > never changed since its introduction. > > Looking across Linux, there's a mess too. > > acpi-cpufreq and energy_perf_policy use the MISC_ENABLE bit (which is > package wide) > intel_ips driver uses PERF_CTL (which is logical processor) > intel_pstate uses MISC_ENABLE || max_pstate == turbo_pstate > > I'm certain I've seen "one pstate being special" WRT turbo before, but I > can't locate anything about this in the SDM, which possibly means it is > model specific. > > ~Andrew >