On 11/27/2023 6:46 AM, Borislav Petkov wrote:
On Sat, Nov 18, 2023 at 01:32:34PM -0600, Yazen Ghannam wrote:
+/* GPU UMCs have MCATYPE=0x1.*/
+bool smca_gpu_umc_bank_type(u64 ipid)
+{
+ if (!smca_umc_bank_type(ipid))
+ return false;
+
+ return FIELD_GET
On 11/27/2023 6:43 AM, Borislav Petkov wrote:
On Sat, Nov 18, 2023 at 01:32:33PM -0600, Yazen Ghannam wrote:
@@ -714,14 +721,10 @@ static bool legacy_mce_is_memory_error(struct mce *m)
*/
static bool smca_mce_is_memory_error(struct mce *m)
{
- enum smca_bank_types bank_type
On 11/22/2023 1:28 PM, Borislav Petkov wrote:
On Sat, Nov 18, 2023 at 01:32:31PM -0600, Yazen Ghannam wrote:
Current AMD systems may report MCA errors using the ACPI Boot Error
Record Table (BERT). The BERT entries for MCA errors will be an x86
Common Platform Error Record (CPER) with an MSR
On 11/22/2023 1:24 PM, Borislav Petkov wrote:
On Sat, Nov 18, 2023 at 01:32:30PM -0600, Yazen Ghannam wrote:
+void mce_setup_global(struct mce *m)
We usually call those things "common":
mce_setup_common().
+{
+ memset(m, 0, sizeof(struct mce));
+
+ m->cpuid
ned-off-by: Avadhut Naik
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/apei.c | 73 +++---
1 file changed, 59 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 4820f8677460..d01c9b272e2f 100644
---
Some Checkpatch checks have been ignored to maintain coding style.
[Yazen: Add last commit message paragraph. Rebase on other MCA updates.]
Suggested-by: Borislav Petkov (AMD)
Signed-off-by: Avadhut Naik
Signed-off-by: Yazen Ghannam
---
arch/x86/include/asm/mce.h | 6 +-
arch/x86/kerne
value from the hardware.
Simplify the enable flow by using the hardware-provided value. Any
conflicts will be caught by setup_APIC_eilvt(). Conflicts on production
systems can be handled as quirks, if needed.
Also, rename the function using a "verb-first" style.
Signed-off-by: Yaz
o maintain coding style.
[Yazen: Add Avadhut as co-developer for wrapper changes. ]
Co-developed-by: Avadhut Naik
Signed-off-by: Avadhut Naik
Signed-off-by: Yazen Ghannam
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/apei.c | 2 ++
arch/x86/kernel/cpu/mce/core.c | 3 ++
to set
up the MCA Thresholding interrupt handler. If successful, set the feature
enable bit in the MCA_CONFIG register to indicate to the Platform that
the OS is ready for the interrupt.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/amd.c | 21 +++--
1 file c
zen's Co-developed-by tag and moved SoB tag.]
[Yazen: Change %Lx to %llx in TP_printk().]
Signed-off-by: Avadhut Naik
Signed-off-by: Yazen Ghannam
---
arch/x86/include/asm/mce.h | 12
arch/x86/kernel/cpu/mce/core.c | 26 ++
drivers/edac/mce_amd.c
the common MCA
code.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/amd.c | 122 +
arch/x86/kernel/cpu/mce/core.c | 16 -
2 files changed, 48 insertions(+), 90 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.
configuration. This ensures the kernel is ready to receive the
interrupts before the hardware is configured to send them.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/amd.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel
The "long names" for SMCA banks are only used by the MCE decoder module.
Move them out of the arch code and into the decoder module.
Signed-off-by: Yazen Ghannam
---
arch/x86/include/asm/mce.h| 1 -
arch/x86/kernel/cpu/mce/amd.c | 74 ++-
dr
read it once during init and pass to
functions that need it. Start with the Deferred error interrupt case.
The MCA Thresholding interrupt case will be handled during refactoring.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/amd.c | 46 +++
1 file
the MCA_CONFIG MSR, so include updated register field
definitions using bitops.
Leave old code until init path is cleaned up.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/amd.c | 84 ---
1 file changed, 49 insertions(+), 35 deletions(-)
diff --git a
esholding data structures are not initialized.
Check "bank_map" to determine if the thresholding structures should be
allocated and initialized. Also, remove "mce_flags.amd_threshold" which
is redundant when checking "bank_map".
Signed-off-by: Yazen Ghannam
---
arch/x
reset"
flow.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/amd.c | 57 ++
arch/x86/kernel/cpu/mce/core.c | 8 +
arch/x86/kernel/cpu/mce/internal.h | 2 ++
3 files changed, 37 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kernel/cpu/m
code until init path is cleaned up.
Signed-off-by: Yazen Ghannam
---
arch/x86/include/asm/mce.h| 2 +-
arch/x86/kernel/cpu/mce/amd.c | 88 ---
drivers/edac/mce_amd.c| 2 +-
3 files changed, 84 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/a
removed.
Signed-off-by: Yazen Ghannam
---
arch/x86/include/asm/mce.h | 4 +++-
arch/x86/kernel/cpu/mce/amd.c | 12 +++-
drivers/edac/amd64_edac.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 -
4 files changed, 19 insertions(+), 8
ror is for
memory, and where the exact bank type is not needed.
Use bitops and rename old mask until removed.
Signed-off-by: Yazen Ghannam
---
arch/x86/include/asm/mce.h| 3 ++-
arch/x86/kernel/cpu/mce/amd.c | 15 +--
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git
odels. Related to this quirk is
code to disable MCA Thresholding for the IF bank.
Check the bank number for the quirks instead of determining the bank
type.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/amd.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arc
CPU number. If no possible
CPU was found, then return early.
Gather the global MCA information first, save the found CPU number, then
gather the per-CPU information.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/apei.c | 18 --
arch/x86/kernel/cpu/mce/internal.h
t with any per-CPU decoding or handling.
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/core.c | 31 +--
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 1642018dd6c9..7e
6/mce: Check whether writes to MCA_STATUS are getting
ignored")
Signed-off-by: Yazen Ghannam
---
arch/x86/kernel/cpu/mce/inject.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 4d8d4bcf915d..72f0695c3dc1 100644
---
ik (2):
x86/mce: Add wrapper for struct mce to export vendor specific info
x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
Yazen Ghannam (18):
x86/mce/inject: Clear test status value
x86/mce: Define mce_setup() helpers for global and per-CPU fields
x86/mce: Use mce_
On Thu, Aug 25, 2022 at 08:54:46AM -0400, Russell, Kent wrote:
> [AMD Official Use Only - General]
>
> It does indeed short-circuit on || (If the left side of an || statement is
> not 0, it doesn't evaluate the right side and returns true). So we can ignore
> this patch, since checking for each
On Sat, Sep 25, 2021 at 01:20:57PM +0200, Borislav Petkov wrote:
> On Fri, Sep 24, 2021 at 07:46:10PM +0000, Yazen Ghannam wrote:
> > I agree with you in general. But this device isn't really a GPU. And
> > users of this device seem to want to count *every* error, at least f
>
> Signed-off-by: Mukul Joshi
> ---
This patch looks good to me overall.
Reviewed-by: Yazen Ghannam
Thanks,
Yazen
On Thu, Sep 23, 2021 at 08:14:15PM +0200, Borislav Petkov wrote:
> On Thu, Sep 23, 2021 at 05:23:21PM +0000, Yazen Ghannam wrote:
> > Shouldn't the error still be reported to EDAC for decoding and counting? I
> > think users want this.
>
> You know what happens with u
On Thu, Sep 23, 2021 at 11:30:55AM -0400, Joshi, Mukul wrote:
...
> > > + return NOTIFY_DONE;
> > > +
> > > + /*
> > > + * If it is correctable error, return.
> > > + */
> > > + if (mce_is_correctable(m))
> > > + return NOTIFY_OK;
> >
> > Shouldn't this be "NOTIFY_DONE" if "don't
On Wed, Sep 22, 2021 at 03:36:20PM -0400, Mukul Joshi wrote:
> On Aldebaran, GPU driver will handle bad page retirement
> even though UMC is host managed. As a result, register a
> bad page retirement handler on the mce notifier chain to
> retire bad pages on Aldebaran.
>
I think this should stat
On Thu, May 27, 2021 at 03:54:27PM -0400, Joshi, Mukul wrote:
...
> > Is that the same deferred interrupt which calls
> > amd_deferred_error_interrupt() ?
>
> Sorry picking this up after sometime. I thought I had replied to this email.
> Yes it is the same deferred interrupt which calls
> amd_def
32 matches
Mail list logo