Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
On Fri, Jul 28, 2017 at 05:08:50PM +0200, Borislav Petkov wrote: > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c > b/arch/x86/kernel/cpu/mcheck/mce.c > index 6dde0497efc7..9486a2ca6556 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -162,6 +162,9 @@

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
On Fri, Jul 28, 2017 at 05:08:50PM +0200, Borislav Petkov wrote: > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c > b/arch/x86/kernel/cpu/mcheck/mce.c > index 6dde0497efc7..9486a2ca6556 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -162,6 +162,9 @@

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
Ok, here's a working version. It looks pretty straight-forward (to me, at least) and it does what it is supposed to when I inject an MCE: # tracer: nop # # _-=> irqs-off # / _=> need-resched #| / _---=>

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
Ok, here's a working version. It looks pretty straight-forward (to me, at least) and it does what it is supposed to when I inject an MCE: # tracer: nop # # _-=> irqs-off # / _=> need-resched #| / _---=>

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
On Thu, Jul 27, 2017 at 04:42:19PM +, Luck, Tony wrote: > s/common errors/architectural errors/ > > That means we don't need to keep updating for every Xeon that documents > some MCi_STATUS.MSCOD bits. Decoding the MCACOD bits will explain > which component is involved (cache, bus, memory)

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
On Thu, Jul 27, 2017 at 04:42:19PM +, Luck, Tony wrote: > s/common errors/architectural errors/ > > That means we don't need to keep updating for every Xeon that documents > some MCi_STATUS.MSCOD bits. Decoding the MCACOD bits will explain > which component is involved (cache, bus, memory)

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
On Fri, Jul 28, 2017 at 08:37:53AM +0200, Ingo Molnar wrote: > Yeah, structured, append-only ABIs are elegant - that's what perf uses too. Yeah. > Had do vent my (non kernel tree integrated) Linux tooling frustration!! ;-) I know *exactly* what you mean! :-) -- Regards/Gruss, Boris. ECO

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Borislav Petkov
On Fri, Jul 28, 2017 at 08:37:53AM +0200, Ingo Molnar wrote: > Yeah, structured, append-only ABIs are elegant - that's what perf uses too. Yeah. > Had do vent my (non kernel tree integrated) Linux tooling frustration!! ;-) I know *exactly* what you mean! :-) -- Regards/Gruss, Boris. ECO

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Ingo Molnar
* Borislav Petkov wrote: > BUT(!), I just realized, I *think* I can address this much more elegantly: > extend trace_mce_record() by adding the decoded string as its last argument. > And > that's fine, I'm being told, because adding arguments to the tracepoints is > not > a

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-28 Thread Ingo Molnar
* Borislav Petkov wrote: > BUT(!), I just realized, I *think* I can address this much more elegantly: > extend trace_mce_record() by adding the decoded string as its last argument. > And > that's fine, I'm being told, because adding arguments to the tracepoints is > not > a big deal,

RE: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Luck, Tony
> Later, we could extend that same behavior to Intel for the common > errors, at least, so that we can dump at least *some* string explaining > what the error is. s/common errors/architectural errors/ That means we don't need to keep updating for every Xeon that documents some MCi_STATUS.MSCOD

RE: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Luck, Tony
> Later, we could extend that same behavior to Intel for the common > errors, at least, so that we can dump at least *some* string explaining > what the error is. s/common errors/architectural errors/ That means we don't need to keep updating for every Xeon that documents some MCi_STATUS.MSCOD

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Borislav Petkov
On Thu, Jul 27, 2017 at 10:39:27AM +0200, Ingo Molnar wrote: > I don't think so: we routinely log several KB per event worth of call chains > via > perf tracing just fine. > > So I'd suggest logging more than less, and making it more verbose is > definitely > the way to go. No no, you're

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Borislav Petkov
On Thu, Jul 27, 2017 at 10:39:27AM +0200, Ingo Molnar wrote: > I don't think so: we routinely log several KB per event worth of call chains > via > perf tracing just fine. > > So I'd suggest logging more than less, and making it more verbose is > definitely > the way to go. No no, you're

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Ingo Molnar
* Borislav Petkov wrote: > On Thu, Jul 27, 2017 at 09:10:34AM +0200, Ingo Molnar wrote: > > Looks pretty nice to me conceptually. Do you have a couple of examples of > > real-life events that get logged? It's hard to decode it from the new > > tracepoint > > alone. > >

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Ingo Molnar
* Borislav Petkov wrote: > On Thu, Jul 27, 2017 at 09:10:34AM +0200, Ingo Molnar wrote: > > Looks pretty nice to me conceptually. Do you have a couple of examples of > > real-life events that get logged? It's hard to decode it from the new > > tracepoint > > alone. > > Here's what comes out

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Borislav Petkov
On Thu, Jul 27, 2017 at 09:10:34AM +0200, Ingo Molnar wrote: > Looks pretty nice to me conceptually. Do you have a couple of examples of > real-life events that get logged? It's hard to decode it from the new > tracepoint > alone. Here's what comes out in dmesg: [ 932.370319] mce: [Hardware

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Borislav Petkov
On Thu, Jul 27, 2017 at 09:10:34AM +0200, Ingo Molnar wrote: > Looks pretty nice to me conceptually. Do you have a couple of examples of > real-life events that get logged? It's hard to decode it from the new > tracepoint > alone. Here's what comes out in dmesg: [ 932.370319] mce: [Hardware

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Ingo Molnar
* Borislav Petkov wrote: > From: Borislav Petkov > > Hi, > > here's a first stab at adding a tracepoint which dumps the decoded MCE > string to userspace. The main idea is to have the decoding functionality > in the kernel and depending on whether you have

Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Ingo Molnar
* Borislav Petkov wrote: > From: Borislav Petkov > > Hi, > > here's a first stab at adding a tracepoint which dumps the decoded MCE > string to userspace. The main idea is to have the decoding functionality > in the kernel and depending on whether you have userspace consumers > listening or

[RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-25 Thread Borislav Petkov
From: Borislav Petkov Hi, here's a first stab at adding a tracepoint which dumps the decoded MCE string to userspace. The main idea is to have the decoding functionality in the kernel and depending on whether you have userspace consumers listening or not, to dump the error to the

[RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-25 Thread Borislav Petkov
From: Borislav Petkov Hi, here's a first stab at adding a tracepoint which dumps the decoded MCE string to userspace. The main idea is to have the decoding functionality in the kernel and depending on whether you have userspace consumers listening or not, to dump the error to the tracepoint or