Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-08 Thread Borislav Petkov
On Fri, Jan 08, 2021 at 06:55:14AM -0800, Paul E. McKenney wrote: > Looks good to me! I agree that your change to the pr_emerg() string is > much better than my original. Well, the rest of the MCE code uses pr_emerg on that path so... > And good point on your added comment, plus it was fun to

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-08 Thread Paul E. McKenney
On Fri, Jan 08, 2021 at 01:31:56PM +0100, Borislav Petkov wrote: > On Thu, Jan 07, 2021 at 09:08:44AM -0800, Paul E. McKenney wrote: > > Some information is usually better than none. And I bet that failing > > hardware is capable of all sorts of tricks at all sorts of levels. ;-) > > Tell me

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-08 Thread Borislav Petkov
On Thu, Jan 07, 2021 at 09:08:44AM -0800, Paul E. McKenney wrote: > Some information is usually better than none. And I bet that failing > hardware is capable of all sorts of tricks at all sorts of levels. ;-) Tell me about it. > Updated patch below. Is this what you had in mind? Ok, so I've

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-07 Thread Paul E. McKenney
On Thu, Jan 07, 2021 at 08:07:24AM +0100, Borislav Petkov wrote: > On Wed, Jan 06, 2021 at 11:13:53AM -0800, Paul E. McKenney wrote: > > Not yet, it isn't! Well, except in -rcu. ;-) > > Of course it is - saying "This commit" in this commit's commit message > is very much a tautology. :-)

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Borislav Petkov
On Wed, Jan 06, 2021 at 11:13:53AM -0800, Paul E. McKenney wrote: > Not yet, it isn't! Well, except in -rcu. ;-) Of course it is - saying "This commit" in this commit's commit message is very much a tautology. :-) > You are suggesting dropping mce_missing_cpus and just doing this? > > if

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Paul E. McKenney
On Thu, Jan 07, 2021 at 12:26:19AM +, Luck, Tony wrote: > > Please see below for an updated patch. > > Yes. That worked: > > [ 78.946069] mce: mce_timed_out: MCE holdout CPUs (may include false > positives): 24-47,120-143 > [ 78.946151] mce: mce_timed_out: MCE holdout CPUs (may include

RE: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Luck, Tony
> Please see below for an updated patch. Yes. That worked: [ 78.946069] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143 [ 78.946151] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143 [ 78.946153] Kernel panic - not

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Paul E. McKenney
On Wed, Jan 06, 2021 at 02:49:18PM -0800, Luck, Tony wrote: > On Wed, Jan 06, 2021 at 11:17:08AM -0800, Paul E. McKenney wrote: > > On Wed, Jan 06, 2021 at 06:39:30PM +, Luck, Tony wrote: > > > > The "Timeout: Not all CPUs entered broadcast exception handler" message > > > > will appear from

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Luck, Tony
On Wed, Jan 06, 2021 at 11:17:08AM -0800, Paul E. McKenney wrote: > On Wed, Jan 06, 2021 at 06:39:30PM +, Luck, Tony wrote: > > > The "Timeout: Not all CPUs entered broadcast exception handler" message > > > will appear from time to time given enough systems, but this message does > > > not

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Paul E. McKenney
On Wed, Jan 06, 2021 at 06:39:30PM +, Luck, Tony wrote: > > The "Timeout: Not all CPUs entered broadcast exception handler" message > > will appear from time to time given enough systems, but this message does > > not identify which CPUs failed to enter the broadcast exception handler. > >

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Paul E. McKenney
On Wed, Jan 06, 2021 at 07:32:44PM +0100, Borislav Petkov wrote: > On Wed, Jan 06, 2021 at 09:41:02AM -0800, Paul E. McKenney wrote: > > The "Timeout: Not all CPUs entered broadcast exception handler" message > > will appear from time to time given enough systems, but this message does > > not

RE: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Luck, Tony
> The "Timeout: Not all CPUs entered broadcast exception handler" message > will appear from time to time given enough systems, but this message does > not identify which CPUs failed to enter the broadcast exception handler. > This information would be valuable if available, for example, in order

Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Borislav Petkov
On Wed, Jan 06, 2021 at 09:41:02AM -0800, Paul E. McKenney wrote: > The "Timeout: Not all CPUs entered broadcast exception handler" message > will appear from time to time given enough systems, but this message does > not identify which CPUs failed to enter the broadcast exception handler. > This

[PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

2021-01-06 Thread Paul E. McKenney
The "Timeout: Not all CPUs entered broadcast exception handler" message will appear from time to time given enough systems, but this message does not identify which CPUs failed to enter the broadcast exception handler. This information would be valuable if available, for example, in order to