Re: [PATCH RFC 2/4] arm64: mm: Add RAS extension system register check to SEA handling

2019-07-10 Thread Tyler Baicar OS
Hi James, Mark, On Tue, Jul 9, 2019 at 8:52 PM Tyler Baicar OS wrote: > On Mon, Jul 8, 2019 at 10:10 AM James Morse wrote: > > On 02/07/2019 17:51, Tyler Baicar OS wrote: > > > @@ -632,6 +633,8 @@ static int do_sea(unsigned long addr, unsigned int > > >

Re: [PATCH RFC 2/4] arm64: mm: Add RAS extension system register check to SEA handling

2019-07-09 Thread Tyler Baicar OS
On Mon, Jul 8, 2019 at 10:10 AM James Morse wrote: > On 02/07/2019 17:51, Tyler Baicar OS wrote: > > On systems that support the ARM RAS extension, synchronous external > > abort syndrome information could be captured in the core's RAS extension > > system registers. So

Re: [PATCH RFC 1/4] ACPI/AEST: Initial AEST driver

2019-07-09 Thread Tyler Baicar OS
Hello Shiju, Thank you for the feedback! On Thu, Jul 4, 2019 at 12:03 PM Shiju Jose wrote: > >+struct ras_ext_regs { > >+ u64 err_fr; > >+ u64 err_ctlr; > >+ u64 err_status; > >+ u64 err_addr; > >+ u64 err_misc0; > >+ u64 err_misc1; > >+ u64 err_misc2; > >+

Re: [PATCH RFC 1/4] ACPI/AEST: Initial AEST driver

2019-07-03 Thread Tyler Baicar OS
Hello Andrew, Thank you for the feedback! On Wed, Jul 3, 2019 at 5:26 AM Andrew Murray wrote: > > On Tue, Jul 02, 2019 at 04:51:38PM +, Tyler Baicar OS wrote: > > Add support for parsing the ARM Error Source Table and basic handling of > > errors reported through both

[PATCH RFC 0/4] ARM Error Source Table Support

2019-07-02 Thread Tyler Baicar OS
ork: - UER handling to avoid panic - Looping through all external abort capable (ERRFR.UE != 0) error nodes in SEA/SEI handling - ARMv8.4 extension support [0] https://static.docs.arm.com/den0085/a/DEN0085_RAS_ACPI_1.0_BETA_1.pdf Tyler Baicar (4): ACPI/AEST: Initial AEST driver arm64: m

[PATCH RFC 4/4] trace, ras: add ARM RAS extension trace event

2019-07-02 Thread Tyler Baicar OS
Add a trace event for hardware errors reported by the ARMv8.2 RAS extension registers. Signed-off-by: Tyler Baicar --- arch/arm64/kernel/ras.c | 3 +++ drivers/acpi/arm64/aest.c | 4 include/ras/ras_event.h | 46 ++ 3 files changed, 53

[PATCH RFC 3/4] arm64: traps: Add RAS extension system register check to serror handling

2019-07-02 Thread Tyler Baicar OS
On systems that support the ARM RAS extension, serror interrupt syndrome information could be captured in the core's RAS extension system registers. When handling serrors, check the RAS system registers for error syndrome information. Signed-off-by: Tyler Baicar --- arch/arm64/kernel/tr

[PATCH RFC 1/4] ACPI/AEST: Initial AEST driver

2019-07-02 Thread Tyler Baicar OS
Add support for parsing the ARM Error Source Table and basic handling of errors reported through both memory mapped and system register interfaces. Signed-off-by: Tyler Baicar --- arch/arm64/include/asm/ras.h | 41 + arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/ras.c | 67

[PATCH RFC 2/4] arm64: mm: Add RAS extension system register check to SEA handling

2019-07-02 Thread Tyler Baicar OS
On systems that support the ARM RAS extension, synchronous external abort syndrome information could be captured in the core's RAS extension system registers. So, when handling SEAs check the RAS system registers for error syndrome information. Signed-off-by: Tyler Baicar --- arch/arm

Re: [PATCH v2 0/2] Fix crash in cper_estatus_check()

2019-01-30 Thread Tyler Baicar
t the patches since the hardware error record on that machine has > been cleared. > > Ross Lagerwall (2): > acpi/apei: Fix possible out-of-bounds access to BERT region > efi/cper: Fix possible out-of-bounds access For both patches: Tested-by: Tyler Baicar

Re: [PATCH] ACPI/APEI: Clear GHES block_status before panic()

2018-12-20 Thread Tyler Baicar
> for exactly the same fatal error. > > Otherwise ghes_probe(), running in the crash kernel, would see > an unhandled error in the APEI generic error status block and > panic again, thereby precluding any crash dump. > > Signed-off-by: Lenny Szubowicz > Signed-off-by: Da

Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

2018-11-27 Thread Tyler Baicar
On Tue, Nov 27, 2018 at 1:32 PM Sinan Kaya wrote: > > On 11/27/2018 1:22 PM, alex_gagn...@dellteam.com wrote: > > On 11/20/2018 04:08 PM, Sinan Kaya wrote: > >> I followed the ASWG thread yesterday. There will be a meeting next week to > >> discuss this. > > > > Any updates on the meeting? > > > >

Re: [PATCH v2] EDAC, ghes: use CPER module handles to locate DIMMs

2018-08-30 Thread Tyler Baicar
d *arg) > (*num_dimm)++; > } > > +static int get_dimm_smbios_index(u16 handle) > +{ > + struct mem_ctl_info *mci; > + int i; > + > + mci = ghes_pvt->mci; > + Minor nit: you could define and set mci in the same line to save some space here. Otherwise this patch looks good to me. Reviewed-by: Tyler Baicar

Re: [PATCH] EDAC, ghes: use CPER module handles to locate DIMMs

2018-08-30 Thread Tyler Baicar
On Thu, Aug 30, 2018 at 12:32 PM, James Morse wrote: > Hi Fan, > > On 30/08/18 15:40, wufan wrote: @@ -327,12 +349,20 @@ void ghes_edac_report_mem_error(int sev, >>> struct cper_sec_mem_err *mem_err) p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos); if (mem_err->vali

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-28 Thread Tyler Baicar
On Tue, Aug 28, 2018 at 1:11 PM, James Morse wrote: > On 24/08/18 16:14, Tyler Baicar wrote: >> On Fri, Aug 24, 2018 at 5:48 AM, James Morse wrote: >>> On 23/08/18 16:46, Tyler Baicar wrote: >>> so edac_raw_mc_handle_error() has no clue where the error happened. (I &

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-24 Thread Tyler Baicar
On Fri, Aug 24, 2018 at 5:48 AM, James Morse wrote: > On 23/08/18 16:46, Tyler Baicar wrote: >> On Thu, Aug 23, 2018 at 5:29 AM James Morse wrote: >>> On 19/07/18 19:36, Tyler Baicar wrote: >>>> This seems pretty hacky to me, so if anyone has other suggestion

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-23 Thread Tyler Baicar
Hello James, On Thu, Aug 23, 2018 at 5:29 AM James Morse wrote: > On 19/07/18 19:36, Tyler Baicar wrote: > > On 7/19/2018 10:46 AM, James Morse wrote: > >> On 19/07/18 15:01, Borislav Petkov wrote: > >>> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote:

Re: [PATCH RESEND v2] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-09 Thread Tyler Baicar
On Thu, Aug 9, 2018 at 6:16 PM, gengdongjiu wrote: > 2018-08-10 5:05 GMT+08:00 Tyler Baicar : >> On Thu, Aug 9, 2018 at 8:32 AM, gengdongjiu wrote: >>> >>> 2018-08-08 0:26 GMT+08:00 Dongjiu Geng : >>> > In order to remove the additional check before calling

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-07-19 Thread Tyler Baicar
On 7/19/2018 10:46 AM, James Morse wrote: On 19/07/18 15:01, Borislav Petkov wrote: On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: Enable per-layer error reporting for ARM systems so that the error counters are incremented per-DIMM. On ARM systems that use firmware first error

[RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-07-16 Thread Tyler Baicar
systems so that the EDAC error counters are incremented based on DIMM number as per the SMBIOS table rather than just incrementing the noinfo counters on the memory controller. Signed-off-by: Tyler Baicar --- drivers/edac/ghes_edac.c | 15 --- 1 file changed, 12 insertions(+), 3

[PATCH] PCI/AER: Adopt lspci naming convention for AER prints

2018-06-26 Thread Tyler Baicar
lspci uses abbreviated naming for AER error strings. Adopt the same naming convention for the AER printing so they match. Signed-off-by: Tyler Baicar --- drivers/pci/pcie/aer.c | 46 +++--- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a

Re: [PATCH v5 3/5] PCI/AER: Add sysfs attributes to provide breakdown of AERs

2018-06-22 Thread Tyler Baicar
On 6/21/2018 5:25 PM, Rajat Jain wrote: On Thu, Jun 21, 2018 at 11:48 AM, Bjorn Helgaas wrote: [+cc Tyler for AER dmesg decoding] - Tyler posted a patch [1] to update those dmesg strings so they match the way lspci decodes them. I really liked that update, but we never quite finished it

Re: [PATCH v6 2/2] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-22 Thread Tyler Baicar
On 5/22/2018 10:32 AM, Alex G. wrote: I think the biggest problem is having a policy to panic on "fatal" errors, instead of letting the error handler make that decision. I'd much rather kill that stupid policy, but people seem to like it for some reason. You can get around that panic and still

Re: [PATCH v6 2/2] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-21 Thread Tyler Baicar
On 5/21/2018 9:49 AM, Alexandru Gagniuc wrote: +/* PCIe errors should not cause a panic. */ +static int ghes_sec_pcie_severity(struct acpi_hest_generic_data *gdata) +{ + struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata); + + if (pcie_err->validation_bits & CPER_PCIE_VALID_

Re: [PATCH RESEND] PCI/AER: Use a common function to print AER error bits

2018-04-26 Thread Tyler Baicar
. Signed-off-by: Alexandru Gagniuc Tested-by: Tyler Baicar Thanks! --- drivers/pci/pcie/aer/aerdrv_errprint.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index cfc89dd57831

Re: [PATCH 1/2] efi/esrt: fix unsupported version initialization failure

2018-03-08 Thread Tyler Baicar
On 2/24/2018 2:20 AM, Dave Young wrote: On 02/23/18 at 12:42pm, Tyler Baicar wrote: If ESRT initialization fails due to an unsupported version, the early_memremap allocation is never unmapped. This will cause an early ioremap leak. So, make sure to unmap the memory allocation before returning

Re: [PATCH 0/2] ESRT fixes for relocatable kexec'd kernel

2018-03-07 Thread Tyler Baicar
Hello Akashi, On 3/6/2018 4:00 AM, AKASHI Takahiro wrote: Tyler, Jeffrey, On Fri, Mar 02, 2018 at 08:27:11AM -0500, Tyler Baicar wrote: On 3/2/2018 12:53 AM, AKASHI Takahiro wrote: Tyler, Jeffrey, [Note: This issue takes place in kexec, not kdump. So to be precise, it is not the same

Re: [PATCH 0/2] ESRT fixes for relocatable kexec'd kernel

2018-03-02 Thread Tyler Baicar
://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/553098.html ] On Thu, Mar 01, 2018 at 12:56:38PM -0500, Tyler Baicar wrote: Hello, On 2/28/2018 9:50 PM, AKASHI Takahiro wrote: Hi, On Wed, Feb 28, 2018 at 08:39:42AM -0700, Jeffrey Hugo wrote: On 2/27/2018 11:19 PM, AKASHI Takahiro

Re: [PATCH v2] PCI/AER: update AER status string print to match other AER logs

2018-02-27 Thread Tyler Baicar
Hello Bjorn, On 2/7/2018 3:11 PM, Tyler Baicar wrote: Currently the AER driver uses cper_print_bits() to print the AER status string. This causes the status string to not include the proper PCI device name prefix that the other AER prints include. Also, it has a different print level than all

Re: [PATCH 2/2] efi/esrt: mark ESRT memory region as nomap

2018-02-26 Thread Tyler Baicar
Hello Ard, On 2/24/2018 3:03 AM, Ard Biesheuvel wrote: Hi Tyler, On 23 February 2018 at 19:42, Tyler Baicar wrote: The ESRT memory region is being exposed as System RAM in /proc/iomem which is wrong because it cannot be overwritten. This memory is needed for kexec kernels in order to

[PATCH 1/2] efi/esrt: fix unsupported version initialization failure

2018-02-23 Thread Tyler Baicar
If ESRT initialization fails due to an unsupported version, the early_memremap allocation is never unmapped. This will cause an early ioremap leak. So, make sure to unmap the memory allocation before returning from efi_esrt_init(). Signed-off-by: Tyler Baicar --- drivers/firmware/efi/esrt.c | 2

[PATCH 2/2] efi/esrt: mark ESRT memory region as nomap

2018-02-23 Thread Tyler Baicar
that it is not overwritten. Signed-off-by: Tyler Baicar Tested-by: Jeffrey Hugo --- drivers/firmware/efi/esrt.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c index 504f3c3..f5f79c7 100644 --- a/drivers/firmware/efi/esrt.c

[PATCH 0/2] ESRT fixes for relocatable kexec'd kernel

2018-02-23 Thread Tyler Baicar
returning. This still leaves ESRT unable to initialize in the kexec'd kernel, so now mark the ESRT memory block as nomap so that this memory is not treated as System RAM. With this change I'm able to see that the ESRT data is not overwritten when running a kexec'd kernel. Tyler Ba

[PATCH v2] PCI/AER: update AER status string print to match other AER logs

2018-02-07 Thread Tyler Baicar
:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID Signed-off-by: Tyler Baicar --- drivers/pci/pcie/aer/aerdrv_errprint.c | 71 ++ 1 file changed, 47 insertions(+), 24 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer

Re: [PATCH v4 0/4] arm64/ras: support sea error recovery

2018-01-25 Thread Tyler Baicar
safe, because we are in process context. In some platform, when SEA triggerred, physical address could be reported by memory section or by processor section, so we save address at this two place. For this series - Tested-by: Tyler Baicar Note that this will probably need to be rebased on top of

[tip:efi/core] efi: Parse ARM error information value

2018-01-03 Thread tip-bot for Tyler Baicar
Commit-ID: 301f55b1a9177132d2b9ce8a90bf0ae4b37bb850 Gitweb: https://git.kernel.org/tip/301f55b1a9177132d2b9ce8a90bf0ae4b37bb850 Author: Tyler Baicar AuthorDate: Tue, 2 Jan 2018 18:10:42 + Committer: Ingo Molnar CommitDate: Wed, 3 Jan 2018 14:03:48 +0100 efi: Parse ARM error

[tip:efi/core] efi: Move ARM CPER code to new file

2018-01-03 Thread tip-bot for Tyler Baicar
Commit-ID: c6d8c8ef1d0d94fdae9f5d72982963db89f9cdad Gitweb: https://git.kernel.org/tip/c6d8c8ef1d0d94fdae9f5d72982963db89f9cdad Author: Tyler Baicar AuthorDate: Tue, 2 Jan 2018 18:10:41 + Committer: Ingo Molnar CommitDate: Wed, 3 Jan 2018 14:03:48 +0100 efi: Move ARM CPER code to

Re: [PATCH] PCI/AER: update AER status string print to match other AER logs

2017-12-13 Thread Tyler Baicar
On 11/15/2017 12:56 PM, Bjorn Helgaas wrote: Hi Tyler, On Wed, Nov 15, 2017 at 09:47:41AM -0500, Tyler Baicar wrote: On 10/17/2017 11:42 AM, Tyler Baicar wrote: Currently the AER driver uses cper_print_bits() to print the AER status string. This causes the status string to not include the

[PATCH V4 0/2] Restructure and fix GHES PCIe AER handling

2017-11-28 Thread Tyler Baicar
First, break the PCIe AER handling out into its own function to separate it from the standard GHES processing Then fix the AER handling to process all errors in the AER driver rather than only handling recoverable errors. V4: Rebase to 4.15-rc1 and add reviewed-by Tyler Baicar (2): acpi: apei

[PATCH V4 2/2] acpi: apei: call into AER handling regardless of severity

2017-11-28 Thread Tyler Baicar
severity Signed-off-by: Tyler Baicar Reviewed-by: Borislav Petkov --- drivers/acpi/apei/ghes.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index f67eb76..cc65d19 100644 --- a/drivers/acpi/apei

[PATCH V4 1/2] acpi: apei: handle PCIe AER errors in separate function

2017-11-28 Thread Tyler Baicar
Move PCIe AER error handling code into a separate function. Signed-off-by: Tyler Baicar Reviewed-by: Borislav Petkov --- drivers/acpi/apei/ghes.c | 64 +--- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b

[PATCH V3 2/2] acpi: apei: call into AER handling regardless of severity

2017-11-15 Thread Tyler Baicar
severity Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 839c3d5..15dbf65 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei

Re: [PATCH] PCI/AER: update AER status string print to match other AER logs

2017-11-15 Thread Tyler Baicar
On 10/17/2017 11:42 AM, Tyler Baicar wrote: Currently the AER driver uses cper_print_bits() to print the AER status string. This causes the status string to not include the proper PCI device name prefix that the other AER prints include. Also, it has a different print level than all the other

Re: [PATCH] PCI/AER: don't call recovery process for correctable errors

2017-11-15 Thread Tyler Baicar
On 10/2/2017 7:19 PM, Bjorn Helgaas wrote: On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: Correctable errors do not need any software intervention, so avoid calling into the software recovery process for correctable errors. Signed-off-by: Tyler Baicar --- drivers/pci/pcie/aer

Re: [PATCH V3 2/2] acpi: apei: call into AER handling regardless of severity

2017-11-13 Thread Tyler Baicar
On 11/13/2017 7:36 AM, Dongdong Liu wrote: 在 2017/11/9 3:13, Tyler Baicar 写道: Currently the GHES code only calls into the AER driver for recoverable type errors. This is incorrect because errors of other severities do not get logged by the AER driver and do not get exposed to user space via

Re: [PATCH V3 2/2] acpi: apei: call into AER handling regardless of severity

2017-11-09 Thread Tyler Baicar
On 11/9/2017 4:46 AM, Borislav Petkov wrote: On Wed, Nov 08, 2017 at 12:13:12PM -0700, Tyler Baicar wrote: Currently the GHES code only calls into the AER driver for recoverable type errors. This is incorrect because errors of other severities do not get logged by the AER driver and do not get

Re: [PATCH V3 2/2] acpi: apei: call into AER handling regardless of severity

2017-11-09 Thread Tyler Baicar
On 11/9/2017 4:46 AM, Borislav Petkov wrote: On Wed, Nov 08, 2017 at 12:13:12PM -0700, Tyler Baicar wrote: Currently the GHES code only calls into the AER driver for recoverable type errors. This is incorrect because errors of other severities do not get logged by the AER driver and do not get

[PATCH V3 2/2] acpi: apei: call into AER handling regardless of severity

2017-11-08 Thread Tyler Baicar
severity Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 839c3d5..bb65fa6 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -458,14

[PATCH 1/2] acpi: apei: handle PCIe AER errors in separate function

2017-11-08 Thread Tyler Baicar
Move PCIe AER error handling code into a separate function. Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 64 +--- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index

[PATCH 0/2] Restructure and fix GHES PCIe AER handling

2017-11-08 Thread Tyler Baicar
First, break the PCIe AER handling out into its own function to separate it from the standard GHES processing Then fix the AER handling to process all errors in the AER driver rather than only handling recoverable errors. Tyler Baicar (2): acpi: apei: handle PCIe AER errors in separate

Re: [PATCH] PCI/AER: don't call recovery process for correctable errors

2017-11-07 Thread Tyler Baicar
On 10/11/2017 1:09 PM, Bjorn Helgaas wrote: On Wed, Oct 11, 2017 at 10:37:47AM -0400, Tyler Baicar wrote: On 10/2/2017 7:19 PM, Bjorn Helgaas wrote: On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: Correctable errors do not need any software intervention, so avoid calling into

Re: [PATCH V2] acpi: apei: call into AER handling regardless of severity

2017-11-07 Thread Tyler Baicar
On 10/17/2017 11:28 AM, Tyler Baicar wrote: Currently the GHES code only calls into the AER driver for recoverable type errors. This is incorrect because errors of other severities do not get logged by the AER driver and do not get exposed to user space via the AER trace event. So, call into the

Re: [PATCH] PCI/AER: update AER status string print to match other AER logs

2017-11-07 Thread Tyler Baicar
On 10/20/2017 7:55 PM, Bjorn Helgaas wrote: On Tue, Oct 17, 2017 at 09:42:02AM -0600, Tyler Baicar wrote: Currently the AER driver uses cper_print_bits() to print the AER status string. This causes the status string to not include the proper PCI device name prefix that the other AER prints

Re: [RFC/RFT PATCH 0/6] Switch GHES ioremap_page_range() to use fixmap

2017-10-31 Thread Tyler Baicar
ghes_ioremap_area or arch_apei_flush_tlb_one(), rip them out. RFC as I've only build-tested this on x86. For arm64 I've tested it on a software model. Any more testing would be welcome. These patches are based on rc7. For the arm64 and APEI patches: Tested-by: Tyler Baicar Verified on arm64. I no long

Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150

2017-10-30 Thread Tyler Baicar
On 10/30/2017 1:46 PM, Linus Torvalds wrote: On Mon, Oct 30, 2017 at 10:20 AM, Linus Torvalds wrote: I will add a "might_sleep()" to ioremap_page_range() itself, so that we get this warning more reliably and much eailer. Right now it has been hidden by the fact that most of the time the time th

Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150

2017-10-30 Thread Tyler Baicar
On 10/30/2017 10:06 AM, Borislav Petkov wrote: On Mon, Oct 30, 2017 at 10:01:52AM -0400, Tyler Baicar wrote: This is not as important for polling sources as it is for the interrupt sources since polling sources are regularly checked and shouldn't be used for fatal error scenarios. For inte

Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150

2017-10-30 Thread Tyler Baicar
On 10/30/2017 7:05 AM, Borislav Petkov wrote: On Mon, Oct 30, 2017 at 12:18:35AM +0100, Fengguang Wu wrote: CC related developers for the BUG in v4.14-rc6. On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: Hi Linus, Up to now we see the below boot error/warnings when testing v4.14

Re: [PATCH] Bug fix: Clear ack of GHES table which contain wrong Error status block, let new error can fill GHES table.

2017-10-30 Thread Tyler Baicar
On 10/29/2017 9:23 PM, Qiang Zheng wrote: Current Error status block processing flow, if wrong format is checked, GHES table ack is not cleared. It will cause new error can not be filled GHES table, because UEFI need check ack to know if error was handled by OS. This patch solved issue, no matte

Re: [PATCH] PCI/AER: update AER status string print to match other AER logs

2017-10-18 Thread Tyler Baicar
On 10/18/2017 6:14 AM, David Laight wrote: From: Tyler Baicar [mailto:tbai...@codeaurora.org] Sent: 17 October 2017 18:14 On 10/17/2017 12:00 PM, David Laight wrote: From: Tyler Baicar Sent: 17 October 2017 16:42 Currently the AER driver uses cper_print_bits() to print the AER status string

Re: [PATCH] efi: parse ARM error information value

2017-10-18 Thread Tyler Baicar
On 10/17/2017 3:30 PM, Andy Shevchenko wrote: On Tue, 2017-10-17 at 11:23 -0600, Tyler Baicar wrote: ARM errors just print out the error information value, then the value needs to be manually decoded as per the UEFI spec. Add decoding of the ARM error information value so that the kernel logs

[PATCH] efi: parse ARM error information value

2017-10-17 Thread Tyler Baicar
in UEFI 2.7 spec tables 263-265. Signed-off-by: Tyler Baicar --- drivers/firmware/efi/cper.c | 213 +++- include/linux/cper.h| 44 + 2 files changed, 255 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/cper.c b/drivers

Re: [PATCH] PCI/AER: update AER status string print to match other AER logs

2017-10-17 Thread Tyler Baicar
On 10/17/2017 12:00 PM, David Laight wrote: From: Tyler Baicar Sent: 17 October 2017 16:42 Currently the AER driver uses cper_print_bits() to print the AER status string. This causes the status string to not include the proper PCI device name prefix that the other AER prints include. Also, it

[PATCH] PCI/AER: update AER status string print to match other AER logs

2017-10-17 Thread Tyler Baicar
Layer, aer_agent=Receiver ID pcieport 0003:00:00.0: aer_status: 0x1000, aer_mask: 0xe000 pcieport 0003:00:00.0: Replay Timer Timeout pcieport 0003:00:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID Signed-off-by: Tyler Baicar --- drivers/pci/pcie/aer/aerdrv_errprint.c | 15

[PATCH V2] acpi: apei: call into AER handling regardless of severity

2017-10-17 Thread Tyler Baicar
severity. Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 76 +--- 1 file changed, 46 insertions(+), 30 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 3c3a37b..d7801bc 100644 --- a/drivers/acpi/apei

Re: [PATCH v5 1/2] acpi: apei: remove the unused dead-code for SEA/NMI notification type

2017-10-17 Thread Tyler Baicar
has no chance to be called. Hence, remove the unnecessary handling when CONFIG_ACPI_APEI_SEA is not defined. For the NMI notification, it has the same issue as SEA notification, so also remove the unused dead-code for it. Cc: Tyler Baicar Cc: James Morse Signed-off-by: Dongjiu Geng Tested-by

Re: [PATCH] ACPI / APEI: Convert timers to use timer_setup()

2017-10-17 Thread Tyler Baicar
y Luck Cc: Borislav Petkov Cc: Tyler Baicar Cc: Will Deacon Cc: James Morse Cc: "Jonathan (Zhixiong) Zhang" Cc: Shiju Jose Cc: linux-a...@vger.kernel.org Signed-off-by: Kees Cook Tested-by: Tyler Baicar Verified that the polled error sources still work with this timer setup. Thanks,

Re: [PATCH] PCI/AER: don't call recovery process for correctable errors

2017-10-11 Thread Tyler Baicar
On 10/2/2017 7:19 PM, Bjorn Helgaas wrote: On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: Correctable errors do not need any software intervention, so avoid calling into the software recovery process for correctable errors. Signed-off-by: Tyler Baicar --- drivers/pci/pcie/aer

Re: [PATCH v2] acpi: apei: Add SEI notification type support for ARMv8

2017-09-27 Thread Tyler Baicar
On 9/27/2017 6:05 AM, gengdongjiu wrote: Tyler, Stephen On 2017/9/27 3:23, Tyler Baicar wrote: Signed-off-by: Dongjiu Geng Tested-by: Tyler Baicar Tested this functionality using SEA support. ++Stephen, Something to be aware of, this patch will conflict with https://lkml.org/lkml/2017/9

Re: [PATCH v2] acpi: apei: Add SEI notification type support for ARMv8

2017-09-26 Thread Tyler Baicar
not accurate, so EL3 firmware should identify the address to a invalid value. Signed-off-by: Dongjiu Geng Tested-by: Tyler Baicar Tested this functionality using SEA support. ++Stephen, Something to be aware of, this patch will conflict with https://lkml.org/lkml/2017/9/14/663 It may make

Re: [PATCH V3] acpi: apei: clear error status before acknowledging the error

2017-09-21 Thread Tyler Baicar
On 9/13/2017 8:40 AM, Baicar, Tyler wrote: On 8/29/2017 2:16 AM, Borislav Petkov wrote: On Mon, Aug 28, 2017 at 10:53:41AM -0600, Tyler Baicar wrote: Currently we acknowledge errors before clearing the error status. This could cause a new error to be populated by firmware in-between the error

[PATCH] acpi: apei: call into AER handling regardless of severity

2017-08-28 Thread Tyler Baicar
severity. Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d661d45..5cab238 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -489,9

[PATCH] PCI/AER: don't call recovery process for correctable errors

2017-08-28 Thread Tyler Baicar
Correctable errors do not need any software intervention, so avoid calling into the software recovery process for correctable errors. Signed-off-by: Tyler Baicar --- drivers/pci/pcie/aer/aerdrv_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer

[PATCH V3] acpi: apei: clear error status before acknowledging the error

2017-08-28 Thread Tyler Baicar
tatus before acknowledging the errors. Also, make sure to acknowledge the error if the error status read fails. V3: Seperate check for -ENOENT return value V2: Only send error ack if there was an error populated Remove curly braces that are no longer needed Signed-off-by: Tyler Baicar --- dr

[PATCH V2] acpi: apei: clear error status before acknowledging the error

2017-08-03 Thread Tyler Baicar
tatus before acknowledging the errors. Also, make sure to acknowledge the error if the error status read fails. V2: Only send error ack if there was an error populated Remove curly braces that are no longer needed Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 9 +++-- 1

[PATCH] acpi: apei: fix GHES estatus iteration

2017-08-03 Thread Tyler Baicar
GHES estatus iteration to properly increment through the estatus blocks similar to how the CPER estatus printing iterates through them. Fixes: bbcc2e7b642e ("ras: acpi/apei: cper: add support for generic data v3 structure") Signed-off-by: Tyler Baicar Tested-by: Austin Christ --- dr

[PATCH] acpi: apei: clear error status before acknowledging the error

2017-07-28 Thread Tyler Baicar
tatus before acknowledging the errors. Also, make sure to acknowledge the error if the error status read fails. Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c

[PATCH V17 01/11] acpi: apei: read ack upon ghes record consumption

2017-05-19 Thread Tyler Baicar
eliminating the race condition. Add support for parsing of GHESv2 sub-tables as well. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James Morse --- drivers/acpi/apei/ghes.c | 59 +--- drivers/acpi/apei/hest.c | 7 -- include/acpi

[PATCH V17 04/11] efi: parse ARM processor error

2017-05-19 Thread Tyler Baicar
Add support for ARM Common Platform Error Record (CPER). UEFI 2.6 specification adds support for ARM specific processor error information to be reported as part of the CPER records. This provides more detail on for processor error logs. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang

[PATCH V17 06/11] acpi: apei: handle SEA notification type for ARMv8

2017-05-19 Thread Tyler Baicar
_t to map with in the same way as ghes_ioremap_pfn_irq(). Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James Morse Acked-by: Catalin Marinas --- arch/arm64/Kconfig| 2 ++ arch/arm64/mm/fault.c | 17 drivers/acpi/apei/Kconfig

[PATCH V17 07/11] acpi: apei: panic OS with fatal error status block

2017-05-19 Thread Tyler Baicar
. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang Signed-off-by: Tyler Baicar Reviewed-by: James Mors

[PATCH V17 09/11] ras: acpi / apei: generate trace event for unrecognized CPER section

2017-05-19 Thread Tyler Baicar
rated. Generate a trace event which contains the raw error data for non-standard section type error records. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Tested-by: Shiju Jose --- drivers/acpi/apei/ghes.c | 27 +++ drivers/ras/ras.c | 10 +- in

[PATCH V17 11/11] arm/arm64: KVM: add guest SEA support

2017-05-19 Thread Tyler Baicar
message of an unsupported FSC would be printed and nothing else would happen. With this patch, the code gets routed to the APEI handling of SEAs in the host kernel to report the SEA information. Signed-off-by: Tyler Baicar Acked-by: Catalin Marinas Acked-by: Marc Zyngier Acked-by: Christoffer Dall

[PATCH V17 08/11] efi: print unrecognized CPER section

2017-05-19 Thread Tyler Baicar
then be decoded using vendor specific tools. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James Morse --- drivers/firmware/efi/cper.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/c

[PATCH V17 10/11] trace, ras: add ARM processor error trace event

2017-05-19 Thread Tyler Baicar
section N.2.4.4. Signed-off-by: Tyler Baicar Acked-by: Steven Rostedt Reviewed-by: Xie XiuQi --- drivers/acpi/apei/ghes.c| 6 +- drivers/firmware/efi/cper.c | 1 + drivers/ras/ras.c | 6 ++ include/linux/ras.h | 3 +++ include/ras/ras_event.h | 45

[PATCH V17 02/11] ras: acpi/apei: cper: add support for generic data v3 structure

2017-05-19 Thread Tyler Baicar
The ACPI 6.1 spec adds a new revision of the generic error data entry structure. Add support to handle the new structure as well as properly verify and iterate through the generic data entries. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang --- drivers/acpi/apei/ghes.c| 11

[PATCH V17 00/11] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64

2017-05-19 Thread Tyler Baicar
kml.org/lkml/2016/2/5/544 Jonathan (Zhixiong) Zhang (1): acpi: apei: panic OS with fatal error status block Tyler Baicar (10): acpi: apei: read ack upon ghes record consumption ras: acpi/apei: cper: add support for generic data v3 structure cper: add timestamp print to CPER status printing

[PATCH V17 03/11] cper: add timestamp print to CPER status printing

2017-05-19 Thread Tyler Baicar
The ACPI 6.1 spec added a timestamp to the generic error data entry structure. Print the timestamp out when printing out the error information. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang --- drivers/firmware/efi/cper.c | 26 ++ 1 file changed, 26

[PATCH V17 05/11] arm64: exception: handle Synchronous External Abort

2017-05-19 Thread Tyler Baicar
specific SEA faults so that the new SEA handler is used. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James Morse Acked-by: Catalin Marinas --- arch/arm64/include/asm/esr.h | 1 + arch/arm64/mm/fault.c| 45 ++-- 2 files

[PATCH V3] acpi: apei: check for pending errors when probing GHES entries

2017-05-18 Thread Tyler Baicar
wasn't present. V3: Check for pending errors of all GHES types Signed-off-by: Tyler Baicar --- drivers/acpi/apei/ghes.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d0855c0..5347230 100644 --- a/drivers/acpi/apei/ghes.c

[PATCH V16 02/11] ras: acpi/apei: cper: add support for generic data v3 structure

2017-05-15 Thread Tyler Baicar
The ACPI 6.1 spec adds a new revision of the generic error data entry structure. Add support to handle the new structure as well as properly verify and iterate through the generic data entries. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang --- drivers/acpi/apei/ghes.c| 11

[PATCH V16 05/11] arm64: exception: handle Synchronous External Abort

2017-05-15 Thread Tyler Baicar
specific SEA faults so that the new SEA handler is used. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James Morse Acked-by: Catalin Marinas --- arch/arm64/include/asm/esr.h | 1 + arch/arm64/mm/fault.c| 45 ++-- 2 files

[PATCH V16 07/11] acpi: apei: panic OS with fatal error status block

2017-05-15 Thread Tyler Baicar
. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang Signed-off-by: Tyler Baicar Reviewed-by: James Mors

[PATCH V16 11/11] arm/arm64: KVM: add guest SEA support

2017-05-15 Thread Tyler Baicar
message of an unsupported FSC would be printed and nothing else would happen. With this patch, the code gets routed to the APEI handling of SEAs in the host kernel to report the SEA information. Signed-off-by: Tyler Baicar Acked-by: Catalin Marinas Acked-by: Marc Zyngier Acked-by: Christoffer Dall

[PATCH V16 09/11] ras: acpi / apei: generate trace event for unrecognized CPER section

2017-05-15 Thread Tyler Baicar
rated. Generate a trace event which contains the raw error data for non-standard section type error records. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Tested-by: Shiju Jose --- drivers/acpi/apei/ghes.c | 27 +++ drivers/ras/ras.c | 9 + in

[PATCH V16 10/11] trace, ras: add ARM processor error trace event

2017-05-15 Thread Tyler Baicar
section N.2.4.4. Signed-off-by: Tyler Baicar Acked-by: Steven Rostedt Reviewed-by: Xie XiuQi --- drivers/acpi/apei/ghes.c| 6 +- drivers/firmware/efi/cper.c | 1 + drivers/ras/ras.c | 6 ++ include/linux/ras.h | 3 +++ include/ras/ras_event.h | 45

[PATCH V16 08/11] efi: print unrecognized CPER section

2017-05-15 Thread Tyler Baicar
0 [ 140.739226] {1}[Hardware Error]: 0050: 0101 0001 0000 ... The raw data from the error can then be decoded using vendor specific tools. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James

[PATCH V16 04/11] efi: parse ARM processor error

2017-05-15 Thread Tyler Baicar
Add support for ARM Common Platform Error Record (CPER). UEFI 2.6 specification adds support for ARM specific processor error information to be reported as part of the CPER records. This provides more detail on for processor error logs. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang

[PATCH V16 03/11] cper: add timestamp print to CPER status printing

2017-05-15 Thread Tyler Baicar
The ACPI 6.1 spec added a timestamp to the generic error data entry structure. Print the timestamp out when printing out the error information. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang --- drivers/firmware/efi/cper.c | 26 ++ 1 file changed, 26

[PATCH V16 06/11] acpi: apei: handle SEA notification type for ARMv8

2017-05-15 Thread Tyler Baicar
_t to map with in the same way as ghes_ioremap_pfn_irq(). Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James Morse Acked-by: Catalin Marinas --- arch/arm64/Kconfig| 2 ++ arch/arm64/mm/fault.c | 17 drivers/acpi/apei/Kconfig

[PATCH V16 01/11] acpi: apei: read ack upon ghes record consumption

2017-05-15 Thread Tyler Baicar
eliminating the race condition. Add support for parsing of GHESv2 sub-tables as well. Signed-off-by: Tyler Baicar CC: Jonathan (Zhixiong) Zhang Reviewed-by: James Morse --- drivers/acpi/apei/ghes.c | 59 +--- drivers/acpi/apei/hest.c | 7 -- include/acpi

  1   2   3   >