Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-30 Thread Borislav Petkov
On Wed, Aug 29, 2018 at 11:20:48AM +0100, James Morse wrote: > Right. I'd like ghes-edac to work in the same way for both architectures. > > I think this is best done by stuffing the dmi-handle in struct dimm_info > during > ghes_edac_dmidecode(), then populating the struct edac_raw_error_desc la

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-29 Thread James Morse
Hi Boris, On 29/08/18 08:38, Borislav Petkov wrote: > On Tue, Aug 28, 2018 at 06:09:24PM +0100, James Morse wrote: >> Does x86 have another source of memory-topology information it needs to >> correlate smbios with? > > Bah, pinpointing the DIMM on x86 is a mess. There's no reliable way to > say

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-29 Thread Borislav Petkov
On Tue, Aug 28, 2018 at 06:09:24PM +0100, James Morse wrote: > Does x86 have another source of memory-topology information it needs to > correlate smbios with? Bah, pinpointing the DIMM on x86 is a mess. There's no reliable way to say which DIMM it is in certain cases (interleaving, mirrorring, ..

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-28 Thread Tyler Baicar
On Tue, Aug 28, 2018 at 1:11 PM, James Morse wrote: > On 24/08/18 16:14, Tyler Baicar wrote: >> On Fri, Aug 24, 2018 at 5:48 AM, James Morse wrote: >>> On 23/08/18 16:46, Tyler Baicar wrote: >>> so edac_raw_mc_handle_error() has no clue where the error happened. (I >>> haven't >>> read what it d

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-28 Thread James Morse
Hi Tyler, On 24/08/18 16:14, Tyler Baicar wrote: > On Fri, Aug 24, 2018 at 5:48 AM, James Morse wrote: >> On 23/08/18 16:46, Tyler Baicar wrote: >>> On Thu, Aug 23, 2018 at 5:29 AM James Morse wrote: On 19/07/18 19:36, Tyler Baicar wrote: > This seems pretty hacky to me, so if anyone ha

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-28 Thread James Morse
Hi Fan, On 24/08/18 15:30, wufan wrote: >> Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what >> EDAC_MC_LAYER_SLOT is for? > > Borislav has explained it in his response. Here let me elaborate a little > more. > To use the layer information you need an accurate way to pinpoin

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-28 Thread James Morse
Hi Boris, On 24/08/18 13:01, Borislav Petkov wrote: > On Fri, Aug 24, 2018 at 10:48:24AM +0100, James Morse wrote: >> so edac_raw_mc_handle_error() has no clue where the error happened. (I >> haven't >> read what it does with this information yet). > > See edac_inc_ce_error(), for example - it u

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-24 Thread Tyler Baicar
On Fri, Aug 24, 2018 at 5:48 AM, James Morse wrote: > On 23/08/18 16:46, Tyler Baicar wrote: >> On Thu, Aug 23, 2018 at 5:29 AM James Morse wrote: >>> On 19/07/18 19:36, Tyler Baicar wrote: This seems pretty hacky to me, so if anyone has other suggestions please share them. >>> >>

RE: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-24 Thread wufan
Hi James, > Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what > EDAC_MC_LAYER_SLOT is for? Borislav has explained it in his response. Here let me elaborate a little more. To use the layer information you need an accurate way to pinpoint each component in the layer and the

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-24 Thread Borislav Petkov
On Fri, Aug 24, 2018 at 10:48:24AM +0100, James Morse wrote: > Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what > EDAC_MC_LAYER_SLOT is for? Yap. > so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't > read what it does with this information yet).

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-24 Thread James Morse
Hi Tyler, On 23/08/18 16:46, Tyler Baicar wrote: > On Thu, Aug 23, 2018 at 5:29 AM James Morse wrote: >> On 19/07/18 19:36, Tyler Baicar wrote: >>> On 7/19/2018 10:46 AM, James Morse wrote: On 19/07/18 15:01, Borislav Petkov wrote: > On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-23 Thread Tyler Baicar
Hello James, On Thu, Aug 23, 2018 at 5:29 AM James Morse wrote: > On 19/07/18 19:36, Tyler Baicar wrote: > > On 7/19/2018 10:46 AM, James Morse wrote: > >> On 19/07/18 15:01, Borislav Petkov wrote: > >>> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: > Enable per-layer error r

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-23 Thread James Morse
Hi guys, (CC: +Fan Wu) On 19/07/18 19:36, Tyler Baicar wrote: > On 7/19/2018 10:46 AM, James Morse wrote: >> On 19/07/18 15:01, Borislav Petkov wrote: >>> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: Enable per-layer error reporting for ARM systems so that the error cou

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-07-19 Thread Borislav Petkov
On Thu, Jul 19, 2018 at 02:36:21PM -0400, Tyler Baicar wrote: > With the current ghes_edac setup, it seems the only way this could > work would be to have the firmware always report the module value to My experience with firmware so far is that it is a lost cause, considering all the bugs, snafus

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-07-19 Thread Tyler Baicar
On 7/19/2018 10:46 AM, James Morse wrote: On 19/07/18 15:01, Borislav Petkov wrote: On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: Enable per-layer error reporting for ARM systems so that the error counters are incremented per-DIMM. On ARM systems that use firmware first error h

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-07-19 Thread James Morse
Hi guys, On 19/07/18 15:01, Borislav Petkov wrote: > On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: >> Enable per-layer error reporting for ARM systems so that the error >> counters are incremented per-DIMM. >> >> On ARM systems that use firmware first error handling it is understoo

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-07-19 Thread Borislav Petkov
On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: > Enable per-layer error reporting for ARM systems so that the error > counters are incremented per-DIMM. > > On ARM systems that use firmware first error handling it is understood > that card=channel and module=DIMM on that channel. Po

[RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-07-16 Thread Tyler Baicar
Enable per-layer error reporting for ARM systems so that the error counters are incremented per-DIMM. On ARM systems that use firmware first error handling it is understood that card=channel and module=DIMM on that channel. Populate that information and enable per layer error reporting for ARM sys