Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 12:41 PM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 12:01:52PM -0500, Alex G. wrote: >> I understand your concern with unhandled AER errors evolving into MCE's. >> That's extremely rare, but when it happens you still panic due to the >> MCE. > > I don't like leaving holes in t

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Borislav Petkov
On Fri, May 11, 2018 at 12:01:52PM -0500, Alex G. wrote: > I understand your concern with unhandled AER errors evolving into MCE's. > That's extremely rare, but when it happens you still panic due to the > MCE. I don't like leaving holes in the handling of PCIe errors. You need to handle only thos

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 11:29 AM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 11:12:25AM -0500, Alex G. wrote: >>> I think *you* didn't get it: IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER) is not >>> enough of a check to confirm that there actually *is* an AER driver to >>> handle the errors. If you really want

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Borislav Petkov
On Fri, May 11, 2018 at 11:12:25AM -0500, Alex G. wrote: > > I think *you* didn't get it: IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER) is not > > enough of a check to confirm that there actually *is* an AER driver to > > handle the errors. If you really want to make sure the driver is loaded > > and functi

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 11:02 AM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 10:54:09AM -0500, Alex G. wrote: >> That being clarified, should I replace "crackmonkey" with "broken" in >> the commit message? > > Keep your opinion *outside* of commit messages - their goal is to > explain *why* the chan

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Borislav Petkov
On Fri, May 11, 2018 at 10:54:09AM -0500, Alex G. wrote: > That being clarified, should I replace "crackmonkey" with "broken" in > the commit message? Keep your opinion *outside* of commit messages - their goal is to explain *why* the change is being made in strictly technical language so that whe

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 10:40 AM, Borislav Petkov wrote: > On Mon, Apr 30, 2018 at 04:33:52PM -0500, Alexandru Gagniuc wrote: >> The policy was to panic() when GHES said that an error is "Fatal". >> This logic is wrong for several reasons, as it doesn't take into >> account what caused the error. >> >> PCI

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Borislav Petkov
On Mon, Apr 30, 2018 at 04:33:52PM -0500, Alexandru Gagniuc wrote: > The policy was to panic() when GHES said that an error is "Fatal". > This logic is wrong for several reasons, as it doesn't take into > account what caused the error. > > PCIe fatal errors indicate that the link to a device is ei

[RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-04-30 Thread Alexandru Gagniuc
The policy was to panic() when GHES said that an error is "Fatal". This logic is wrong for several reasons, as it doesn't take into account what caused the error. PCIe fatal errors indicate that the link to a device is either unstable or unusable. They don't indicate that the machine is on fire, a