Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-19 Thread Borislav Petkov
On Fri, Oct 18, 2019 at 01:38:32PM -0700, Luck, Tony wrote: > Sorry to have caused confusion. Ditto. But us causing confusion is fine - this way we can talk about what we really wanna do! :-))) > The thoughts behind that statement are that we currently have an issue > with too many noisy high se

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Luck, Tony
On Fri, Oct 18, 2019 at 09:45:03PM +0200, Borislav Petkov wrote: > On Fri, Oct 18, 2019 at 11:02:57AM -0700, Luck, Tony wrote: > > So what should we do next? > > I was simply keying off this statement of yours: > > "Depending on what we end up with from Srinivas ... we may want to > reconsider th

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Borislav Petkov
On Fri, Oct 18, 2019 at 11:02:57AM -0700, Luck, Tony wrote: > So what should we do next? I was simply keying off this statement of yours: "Depending on what we end up with from Srinivas ... we may want to reconsider the severity." and I don't think that having KERN_CRIT severity for those messag

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Borislav Petkov
On Fri, Oct 18, 2019 at 08:55:17AM -0700, Srinivas Pandruvada wrote: > I assume that someone is having performance issues or occasion reboots, > look at the logs. Is it a fair assumption? Yes, that is a valid use case IMO. > But if a system is running at up to 87.5% of duty cycle on top of > lowe

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Luck, Tony
On Fri, Oct 18, 2019 at 03:23:09PM +0200, Borislav Petkov wrote: > On Fri, Oct 18, 2019 at 05:26:36AM -0700, Srinivas Pandruvada wrote: > > Server/desktops generally rely on the embedded controller for FAN > > control, which kernel have no control. For them this warning helps to > > either bring i

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Srinivas Pandruvada
On Fri, 2019-10-18 at 15:23 +0200, Borislav Petkov wrote: > On Fri, Oct 18, 2019 at 05:26:36AM -0700, Srinivas Pandruvada wrote: > > Server/desktops generally rely on the embedded controller for FAN > > control, which kernel have no control. For them this warning helps > > to > > either bring in a

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Borislav Petkov
On Fri, Oct 18, 2019 at 05:26:36AM -0700, Srinivas Pandruvada wrote: > Server/desktops generally rely on the embedded controller for FAN > control, which kernel have no control. For them this warning helps to > either bring in additional cooling or fix existing cooling. How exactly does this warn

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Srinivas Pandruvada
On Thu, 2019-10-17 at 23:44 +0200, Borislav Petkov wrote: > On Thu, Oct 17, 2019 at 09:31:30PM +, Luck, Tony wrote: > > That sounds like the right short term action. > > > > Depending on what we end up with from Srinivas ... we may want > > to reconsider the severity. The basic premise of Sri

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-18 Thread Peter Zijlstra
On Thu, Oct 17, 2019 at 11:44:45PM +0200, Borislav Petkov wrote: > On Thu, Oct 17, 2019 at 09:31:30PM +, Luck, Tony wrote: > > That sounds like the right short term action. > > > > Depending on what we end up with from Srinivas ... we may want > > to reconsider the severity. The basic premise

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-17 Thread Borislav Petkov
On Thu, Oct 17, 2019 at 11:53:18PM +, Luck, Tony wrote: > > * we throttle the machine from within the kernel - whatever that may mean > > * if that doesn't help, we stop scheduling !root tasks > > * if that doesn't help, we halt > > The silicon will do that "halt" step all by itself if the tem

RE: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-17 Thread Luck, Tony
> * we throttle the machine from within the kernel - whatever that may mean > * if that doesn't help, we stop scheduling !root tasks > * if that doesn't help, we halt The silicon will do that "halt" step all by itself if the temperature continues to rise and hits the highest of the temperature thr

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-17 Thread Borislav Petkov
On Thu, Oct 17, 2019 at 09:31:30PM +, Luck, Tony wrote: > That sounds like the right short term action. > > Depending on what we end up with from Srinivas ... we may want > to reconsider the severity. The basic premise of Srinivas' patch > is to avoid printing anything for short excursions ab

RE: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-17 Thread Luck, Tony
>> That all sounds like the printk should be downgraded too, it is not a >> KERN_CRIT warning. It is more a notification that we're getting warm. > > Right, and I think we should take Benjamin's patch after all - perhaps > even tag it for stable if that message is annoying people too much - and > S

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-16 Thread Borislav Petkov
On Wed, Oct 16, 2019 at 10:14:05AM +0200, Peter Zijlstra wrote: > That all sounds like the printk should be downgraded too, it is not a > KERN_CRIT warning. It is more a notification that we're getting warm. Right, and I think we should take Benjamin's patch after all - perhaps even tag it for sta

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-16 Thread Peter Zijlstra
On Tue, Oct 15, 2019 at 06:31:46AM -0700, Srinivas Pandruvada wrote: > On Tue, 2019-10-15 at 10:48 +0200, Peter Zijlstra wrote: > > On Mon, Oct 14, 2019 at 02:21:00PM -0700, Srinivas Pandruvada wrote: > > > Some modern systems have very tight thermal tolerances. Because of > > > this > > > they may

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-15 Thread Srinivas Pandruvada
On Tue, 2019-10-15 at 10:46 +0200, Borislav Petkov wrote: > On Mon, Oct 14, 2019 at 03:41:38PM -0700, Srinivas Pandruvada wrote: > > So some users who had issues in their systems can try with this > > patch. > > We can get rid of this, till it becomes real issue. > > We don't add command line para

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-15 Thread Srinivas Pandruvada
On Tue, 2019-10-15 at 10:52 +0200, Peter Zijlstra wrote: > On Mon, Oct 14, 2019 at 03:27:35PM -0700, Luck, Tony wrote: > > On Mon, Oct 14, 2019 at 11:36:18PM +0200, Borislav Petkov wrote: > > > This description is already *begging* for this delay value to be > > > automatically set by the kernel. P

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-15 Thread Srinivas Pandruvada
On Tue, 2019-10-15 at 10:48 +0200, Peter Zijlstra wrote: > On Mon, Oct 14, 2019 at 02:21:00PM -0700, Srinivas Pandruvada wrote: > > Some modern systems have very tight thermal tolerances. Because of > > this > > they may cross thermal thresholds when running normal workloads > > (even > > during bo

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-15 Thread Peter Zijlstra
On Mon, Oct 14, 2019 at 03:27:35PM -0700, Luck, Tony wrote: > On Mon, Oct 14, 2019 at 11:36:18PM +0200, Borislav Petkov wrote: > > This description is already *begging* for this delay value to be > > automatically set by the kernel. Putting yet another knob in front of > > the user who doesn't have

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-15 Thread Peter Zijlstra
On Mon, Oct 14, 2019 at 02:21:00PM -0700, Srinivas Pandruvada wrote: > Some modern systems have very tight thermal tolerances. Because of this > they may cross thermal thresholds when running normal workloads (even > during boot). The CPU hardware will react by limiting power/frequency > and using

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-15 Thread Borislav Petkov
On Mon, Oct 14, 2019 at 03:41:38PM -0700, Srinivas Pandruvada wrote: > So some users who had issues in their systems can try with this patch. > We can get rid of this, till it becomes real issue. We don't add command line parameters which we maybe can get rid of later. > The temperature is functi

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-15 Thread Borislav Petkov
On Mon, Oct 14, 2019 at 03:27:35PM -0700, Luck, Tony wrote: > You need a plausible start point for the "when to worry the user" > message. Maybe that is your "max value"? Yes, that would be a good start. You need that anyway because the experimentations you guys did to get your numbers have been

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-14 Thread Srinivas Pandruvada
On Mon, 2019-10-14 at 23:36 +0200, Borislav Petkov wrote: > On Mon, Oct 14, 2019 at 02:21:00PM -0700, Srinivas Pandruvada wrote: > > Some modern systems have very tight thermal tolerances. Because of > > this > > they may cross thermal thresholds when running normal workloads > > (even > > during b

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-14 Thread Luck, Tony
On Mon, Oct 14, 2019 at 11:36:18PM +0200, Borislav Petkov wrote: > This description is already *begging* for this delay value to be > automatically set by the kernel. Putting yet another knob in front of > the user who doesn't have a clue most of the time shows one more time > that we haven't done

Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-14 Thread Borislav Petkov
On Mon, Oct 14, 2019 at 02:21:00PM -0700, Srinivas Pandruvada wrote: > Some modern systems have very tight thermal tolerances. Because of this > they may cross thermal thresholds when running normal workloads (even > during boot). The CPU hardware will react by limiting power/frequency > and using

[PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages

2019-10-14 Thread Srinivas Pandruvada
Some modern systems have very tight thermal tolerances. Because of this they may cross thermal thresholds when running normal workloads (even during boot). The CPU hardware will react by limiting power/frequency and using duty cycles to bring the temperature back into normal range. Thus users may