On Thu, Aug 17, 2017 at 09:08:40PM +, Kani, Toshimitsu wrote:
> 1. It creates mc0 and mc1.
> I think this is because you called edac_mc_alloc() with mc_num 1.
Fixed, see below.
>
> 2. 'ras-mc-ctl --layout' does not show all DIMMs.
Yap, that's strange.
$ grep . /sys/devices/system/edac/mc
On Wed, 2017-08-16 at 11:51 -0600, Toshi Kani wrote:
> On Wed, 2017-08-16 at 19:40 +0200, Borislav Petkov wrote:
> > On Wed, Aug 16, 2017 at 05:28:50PM +, Kani, Toshimitsu wrote:
:
> > > I will test the patch with an SCI when I got a chance. I won't
> > > be able to test other notification ty
On Wed, 2017-08-16 at 19:40 +0200, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 05:28:50PM +, Kani, Toshimitsu wrote:
> > Assuming this big spinlock works, yes, this addresses my concern
> > that
>
> You mean, lengthy locked section. We can always switch to on-stack
> buffers if there are
On Wed, Aug 16, 2017 at 05:28:50PM +, Kani, Toshimitsu wrote:
> Assuming this big spinlock works, yes, this addresses my concern that
You mean, lengthy locked section. We can always switch to on-stack
buffers if there are issues or even to something more fancy like
genpool.
> I will test the
On Wed, Aug 16, 2017 at 10:22:49AM -0400, Steven Rostedt wrote:
> Maybe keep that original mutex just in case.
Let's do the elegant thing:
---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index d4089c2980ef..386e04a7bda0 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac
On Wed, 2017-08-16 at 18:42 +0200, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 03:26:04PM +, Kani, Toshimitsu wrote:
> > I believe you now need to protect from a race condition that a
> > single mci and pvt can be initialized / consumed from multiple
> > threads. This protection is missin
On Wed, Aug 16, 2017 at 03:26:04PM +, Kani, Toshimitsu wrote:
> I believe you now need to protect from a race condition that a single
> mci and pvt can be initialized / consumed from multiple threads. This
> protection is missing in your patch.
Easy. Done.
---
diff --git a/drivers/edac/ghes_
On Wed, 2017-08-16 at 10:29 +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 08:48:16AM -0700, Luck, Tony wrote:
> > Won't the user see all their DIMMs reported for each memory
> > controller
> > under /sys/devices/system/edac/mc/mc*/dimm* ?
> >
> > That sounds confusing.
>
> Right, and ad
On Wed, 16 Aug 2017 16:03:50 +0200
Borislav Petkov wrote:
> > What's the likelihood of two calls to ghes_edac_register being done
> > simultaneously? Because two calls at the same time will get past this.
>
> Well, that thing gets called per GHES platform device and last time I
> checked they
On Wed, Aug 16, 2017 at 09:59:01AM -0400, Steven Rostedt wrote:
> Should the above be:
>
> if (WARN_ON_ONCE(in_nmi()))
> return;
>
> To prevent a deadlock? Or do we not care?
Yeah, better this way.
> What's the likelihood of two calls to ghes_edac_register being done
> simul
On Wed, 16 Aug 2017 10:29:31 +0200
Borislav Petkov wrote:
> ---
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 6f80eb65c26c..a22fabef4791 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -28,10 +28,14 @@ struct ghes_edac_pvt {
> char
On Wed, Aug 16, 2017 at 10:29:31AM +0200, Borislav Petkov wrote:
> Anyway, I think I have a box to run it to, lemme go find it.
Seems to boot.
It's a whole another story whether it actually works. :-)
Need some EINJ capabilities urgently but finding a box where it works
reliably is like finding
On Tue, Aug 15, 2017 at 08:48:16AM -0700, Luck, Tony wrote:
> Won't the user see all their DIMMs reported for each memory controller
> under /sys/devices/system/edac/mc/mc*/dimm* ?
>
> That sounds confusing.
Right, and adding the locking was really easy. If only people would
debate less and actua
On Tue, 2017-08-15 at 17:50 +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 03:35:51PM +, Kani, Toshimitsu wrote:
> > ghes_edac instantiates an mci as a pseudo device representing a
> > GHES error source. Each error source associates with all DIMMs,
> > and may report errors independen
On Tue, 2017-08-15 at 08:48 -0700, Luck, Tony wrote:
> On Tue, Aug 15, 2017 at 08:35:51AM -0700, Kani, Toshimitsu wrote:
> > User apps like ras-mc-ctl works as expected for a given (not-so-
> > great) DIMM info from SMBIOS as well. I do not see a probelm from
> > user perspective, either.
>
> Won
On Tue, Aug 15, 2017 at 03:35:51PM +, Kani, Toshimitsu wrote:
> ghes_edac instantiates an mci as a pseudo device representing a GHES
> error source. Each error source associates with all DIMMs, and may
> report errors independently. As ghes_edac is an GHES error-reporting
> wrapper to edac, t
On Tue, Aug 15, 2017 at 08:35:51AM -0700, Kani, Toshimitsu wrote:
> User apps like ras-mc-ctl works as expected for a given (not-so-great)
> DIMM info from SMBIOS as well. I do not see a probelm from user
> perspective, either.
Won't the user see all their DIMMs reported for each memory controlle
On Mon, 2017-08-14 at 22:39 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 08:17:54PM +, Kani, Toshimitsu wrote:
> > I think the current code design of allocating mci & ghes_edac_pvt
> > for each GHES source entry makes sense.
>
> And I don't.
>
> > edac_raw_mc_handle_error() also ha
On Mon, Aug 14, 2017 at 08:17:54PM +, Kani, Toshimitsu wrote:
> I think the current code design of allocating mci & ghes_edac_pvt for
> each GHES source entry makes sense.
And I don't.
> edac_raw_mc_handle_error() also has the same expectation that the call
> is serialized per mci.
There's n
On Mon, 2017-08-14 at 21:34 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 07:02:15PM +, Kani, Toshimitsu wrote:
> > I do not know how likely we see such case, but the code should be
> > written according to the spec.
>
> Well, then you'll have to make ghes_edac_report_mem_error()
> r
On Mon, Aug 14, 2017 at 07:02:15PM +, Kani, Toshimitsu wrote:
> I do not know how likely we see such case, but the code should be
> written according to the spec.
Well, then you'll have to make ghes_edac_report_mem_error() reentrant.
Which doesn't look that hard as the only thing it really nee
On Mon, 2017-08-14 at 20:35 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 06:17:47PM +, Kani, Toshimitsu wrote:
> > Right, ghes_edac_report_mem_error() gets serialized per a GHES
> > entry, but not globally.
>
> Globally what?
GHES v2's ACK is not a global lock. So, it does not gua
On Mon, Aug 14, 2017 at 06:17:47PM +, Kani, Toshimitsu wrote:
> Right, ghes_edac_report_mem_error() gets serialized per a GHES entry,
> but not globally.
Globally what?
What is the actual potential scenario for concurrency issues you see?
Example pls.
--
Regards/Gruss,
Boris.
Good mail
On Mon, 2017-08-14 at 20:05 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 05:52:25PM +, Kani, Toshimitsu wrote:
> > Yes, but this ACK is done per a GHES entry as well.
>
> So is the ghes_edac_report_mem_error() call.
Right, ghes_edac_report_mem_error() gets serialized per a GHES ent
On Mon, Aug 14, 2017 at 05:52:25PM +, Kani, Toshimitsu wrote:
> Yes, but this ACK is done per a GHES entry as well.
So is the ghes_edac_report_mem_error() call.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Mon, 2017-08-14 at 19:05 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 04:48:57PM +, Kani, Toshimitsu wrote:
> > Right, but the issue is how [ghes_edac_]report_mem_error() protects
> > from possible concurrent calls from multiple GHES sources when
> > there is only a single mci.
>
On Mon, Aug 14, 2017 at 04:48:57PM +, Kani, Toshimitsu wrote:
> Right, but the issue is how [ghes_edac_]report_mem_error() protects
> from possible concurrent calls from multiple GHES sources when there is
> only a single mci.
Do you know of an actual firmware reporting multiple errors concurr
On Mon, 2017-08-14 at 18:24 +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 03:57:35PM +, Kani, Toshimitsu wrote:
> > Hmm... Sorry, I failed to see how your patchset solved it. Would
> > you mind to explain how it is done?
>
> +static int __init ghes_edac_register(void)
> {
> +
On Mon, Aug 14, 2017 at 03:57:35PM +, Kani, Toshimitsu wrote:
> Hmm... Sorry, I failed to see how your patchset solved it. Would you
> mind to explain how it is done?
+static int __init ghes_edac_register(void)
{
+ struct ghes_edac_pvt *pvt = ghes_pvt;
Only one local ghes_pvt structur
On Fri, 2017-08-11 at 11:04 +0200, Borislav Petkov wrote:
> On Mon, Aug 07, 2017 at 05:59:15PM +, Kani, Toshimitsu wrote:
> > I think we should keep the current scheme, which registers an mci
> > for
>
> No we shouldn't.
>
> > each GHES entry. ghes_edac_report_mem_error() expects that error-
On Mon, Aug 07, 2017 at 05:59:15PM +, Kani, Toshimitsu wrote:
> I think we should keep the current scheme, which registers an mci for
No we shouldn't.
> each GHES entry. ghes_edac_report_mem_error() expects that error-
> reporting is serialized per a GHES entry. Sharing a single mci among
>
On Sat, 2017-08-05 at 07:16 +0200, Borislav Petkov wrote:
> On Fri, Aug 04, 2017 at 09:02:17PM +, Kani, Toshimitsu wrote:
> > GHES platform devices correspond to GHES entries, which define
> > firmware interfaces to report generic memory errors to the OS, such
> > as NMI and SCI. These devices
On Fri, Aug 04, 2017 at 09:02:17PM +, Kani, Toshimitsu wrote:
> GHES platform devices correspond to GHES entries, which define firmware
> interfaces to report generic memory errors to the OS, such as NMI and
> SCI. These devices are associated with all DIMMs, not a particular
> DIMM.
And? Sta
On Fri, 2017-08-04 at 06:05 +0200, Borislav Petkov wrote:
> On Thu, Aug 03, 2017 at 03:57:50PM -0600, Toshi Kani wrote:
> > ghes_edac_register() is called for each GHES platform device
> > instantiated per a GHES entry in ACPI HEST table. dmi_walk()
> > counts the number of DIMMs on the system, an
On Thu, Aug 03, 2017 at 03:57:50PM -0600, Toshi Kani wrote:
> ghes_edac_register() is called for each GHES platform device
> instantiated per a GHES entry in ACPI HEST table. dmi_walk()
> counts the number of DIMMs on the system, and there is no need
> to call it multiple times.
>
> Change ghes_e
ghes_edac_register() is called for each GHES platform device
instantiated per a GHES entry in ACPI HEST table. dmi_walk()
counts the number of DIMMs on the system, and there is no need
to call it multiple times.
Change ghes_edac_register() to call dmi_walk() only when
'num_dimm' is uninitialized.
36 matches
Mail list logo