[PATCH 0/8] Fix long standing AER Error Handling Issues

2021-10-04 Thread Naveen Naidu
- "Patch 3/8" is dependent on "Patch 2/3" in the series PATCH 5 - 7 - Deals with converging the various paths and to bring more commonality between them - "Patch 6/8" depends on "Patch 1/8" PATCH 8: - Adds extra information in AER error logs. Thanks,

[PATCH 1/8] PCI/AER: Remove ID from aer_agent_string[]

2021-10-04 Thread Naveen Naidu
ceived: :00:03.0 pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) pcieport :00:03.0: device [1b36:000c] error status/mask=0040/e000 pcieport :00:03.0:[ 6] BadTLP Signed-off-by: Naveen Naidu --- drivers/pci/pcie/aer.c |

[PATCH 2/8] PCI: Cleanup struct aer_err_info

2021-10-04 Thread Naveen Naidu
/UNCOR Error Mask Since the length of the above registers are even, use u16 and u32 to represent their values. Also remove the __pad fields. "pahole" was run on the modified struct aer_err_info and the size remains unchanged. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 8 +

[PATCH 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()

2021-10-04 Thread Naveen Naidu
dpc_process_error() clears both AER fatal and non fatal status registers. Instead of clearing each status registers via a different function call use pci_aer_clear_status(). This helps clean up the code a bit. Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 3 +-- 1 file changed, 1

[PATCH 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers

2021-10-04 Thread Naveen Naidu
when it's an unmasked uncorrectable error. This leads to two different behaviours for the same task (handling of DPC errors) in FFS systems and when native OS has control. Bring the same semantics for clearing the AER status register in EDR path and DPC path. Signed-off-by: Naveen

[PATCH 6/8] PCI/AER: Clear error device AER registers in aer_irq()

2021-10-04 Thread Naveen Naidu
pt handler deals with a interrupt, it must *always* clear the source of the interrupt. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 13 ++- drivers/pci/pcie/aer.c | 245 - 2 files changed, 182 insertions(+), 76 deletions(-) diff --git a/d

[PATCH 7/8] PCI/ERR: Remove redundant clearing of AER register in pcie_do_recovery()

2021-10-04 Thread Naveen Naidu
isters are again cleared in pcie_do_recovery(). This is redundant. Remove redundant clearing of AER register in pcie_do_recovery() Signed-off-by: Naveen Naidu --- drivers/pci/pcie/err.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/err.c b/driver

[PATCH 8/8] PCI/AER: Include DEVCTL in aer_print_error()

2021-10-04 Thread Naveen Naidu
: severity=Corrected, type=Data Link Layer, (Receiver) pcieport :00:03.0: device [1b36:000c] error status/mask=0040/e000, devctl=0x000f pcieport :00:03.0:[ 6] BadTLP Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 2 ++ drivers/pci/pcie/aer.c | 10 -- 2

[PATCH v3 0/8] Fix long standing AER Error Handling Issues

2021-10-04 Thread Naveen Naidu
Patch 3/8" is dependent on "Patch 2/3" in the series PATCH 5 - 7 - Deals with converging the various paths and to bring more commonality between them - "Patch 6/8" depends on "Patch 1/8" PATCH 8: - Adds extra information in AER error logs. Thank

[PATCH v3 1/8] PCI/AER: Remove ID from aer_agent_string[]

2021-10-04 Thread Naveen Naidu
ceived: :00:03.0 pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) pcieport :00:03.0: device [1b36:000c] error status/mask=0040/e000 pcieport :00:03.0:[ 6] BadTLP Signed-off-by: Naveen Naidu --- drivers/pci/pcie/aer.c |

[PATCH v3 2/8] PCI: Cleanup struct aer_err_info

2021-10-04 Thread Naveen Naidu
/UNCOR Error Mask Since the length of the above registers are even, use u16 and u32 to represent their values. Also remove the __pad fields. "pahole" was run on the modified struct aer_err_info and the size remains unchanged. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 8 +

[PATCH v3 3/8] PCI/DPC: Initialize info->id in dpc_process_error()

2021-10-04 Thread Naveen Naidu
d scalar variable (UNINIT) 8. uninit_use_in_call: Using uninitialized value info.id when calling aer_print_error. Initialize the "info->id" before passing it to aer_print_error() Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Naveen Nai

[PATCH v3 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()

2021-10-04 Thread Naveen Naidu
dpc_process_error() clears both AER fatal and non fatal status registers. Instead of clearing each status registers via a different function call use pci_aer_clear_status(). This helps clean up the code a bit. Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 3 +-- 1 file changed, 1

[PATCH v3 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers

2021-10-04 Thread Naveen Naidu
when it's an unmasked uncorrectable error. This leads to two different behaviours for the same task (handling of DPC errors) in FFS systems and when native OS has control. Bring the same semantics for clearing the AER status register in EDR path and DPC path. Signed-off-by: Naveen

[PATCH v3 6/8] PCI/AER: Clear error device AER registers in aer_irq()

2021-10-04 Thread Naveen Naidu
ese errors. The main aim is that: When an interrupt handler deals with a interrupt, it must *always* clear the source of the interrupt. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 13 ++- drivers/pci/pcie/aer.c | 245 - 2 files ch

[PATCH v3 7/8] PCI/ERR: Remove redundant clearing of AER register in pcie_do_recovery()

2021-10-04 Thread Naveen Naidu
isters are again cleared in pcie_do_recovery(). This is redundant. Remove redundant clearing of AER register in pcie_do_recovery() Signed-off-by: Naveen Naidu --- drivers/pci/pcie/err.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/err.c b/driver

[PATCH v3 8/8] PCI/AER: Include DEVCTL in aer_print_error()

2021-10-04 Thread Naveen Naidu
: severity=Corrected, type=Data Link Layer, (Receiver) pcieport :00:03.0: device [1b36:000c] error status/mask=0040/e000, devctl=0x000f pcieport :00:03.0:[ 6] BadTLP Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 2 ++ drivers/pci/pcie/aer.c | 10 -- 2

[PATCH 0/6] MIPS: OCTEON: Remove redundant AER code

2021-10-04 Thread Naveen Naidu
t; in the series. Please note that "Patch 4/6" is dependent on "Patch 1/6". Thanks, Naveen Naidu Naveen Naidu (6): [PATCH 1/6] PCI/AER: Enable COR/UNCOR error reporting in set_device_error_reporting() [PATCH 2/6] MIPS: OCTEON: Remove redundant clearing of AER status reg

[PATCH 1/6] PCI/AER: Enable COR/UNCOR error reporting in set_device_error_reporting()

2021-10-04 Thread Naveen Naidu
service driver. Enable reporting of all correctable and uncorrectable errors during aer_probe() Signed-off-by: Naveen Naidu --- drivers/pci/pcie/aer.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9784fdcf3006

[PATCH v4 0/8] Fix long standing AER Error Handling Issues

2021-10-05 Thread Naveen Naidu
Patch 3/8" is dependent on "Patch 2/8" in the series PATCH 5 - 7 - Deals with converging the various paths and brings more commonality between them - "Patch 6/8" depends on "Patch 1/8" PATCH 8: - Adds extra information in AER error logs. Thank

[PATCH v4 1/8] PCI/AER: Remove ID from aer_agent_string[]

2021-10-05 Thread Naveen Naidu
AER: Corrected error received: :00:03.0 pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) pcieport :00:03.0: device [1b36:000c] error status/mask=00000040/e000 pcieport :00:03.0:[ 6] BadTLP Signed-off-by: Naveen Naidu --- d

[PATCH v4 2/8] PCI: Cleanup struct aer_err_info

2021-10-05 Thread Naveen Naidu
/UNCOR Error Mask Since the length of the above registers are even, use u16 and u32 to represent their values. Also remove the __pad fields. "pahole" was run on the modified struct aer_err_info and the size remains unchanged. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 8 +

[PATCH v4 3/8] PCI/DPC: Initialize info->id in dpc_process_error()

2021-10-05 Thread Naveen Naidu
d scalar variable (UNINIT) 8. uninit_use_in_call: Using uninitialized value info.id when calling aer_print_error. Initialize the "info->id" before passing it to aer_print_error() Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Naveen Nai

[PATCH v4 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()

2021-10-05 Thread Naveen Naidu
dpc_process_error() clears both AER fatal and non fatal status registers. Instead of clearing each status registers via a different function call use pci_aer_clear_status(). This helps clean up the code a bit. Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 3 +-- 1 file changed, 1

[PATCH v4 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers

2021-10-05 Thread Naveen Naidu
when it's an unmasked uncorrectable error. This leads to two different behaviours for the same task (handling of DPC errors) in FFS systems and when native OS has control. Bring the same semantics for clearing the AER status register in EDR path and DPC path. Signed-off-by: Naveen

[PATCH v4 6/8] PCI/AER: Clear error device AER registers in aer_irq()

2021-10-05 Thread Naveen Naidu
clear the source of the interrupt. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 13 ++- drivers/pci/pcie/aer.c | 249 - 2 files changed, 184 insertions(+), 78 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9b

[PATCH v4 7/8] PCI/ERR: Remove redundant clearing of AER register in pcie_do_recovery()

2021-10-05 Thread Naveen Naidu
isters are again cleared in pcie_do_recovery(). This is redundant. Remove redundant clearing of AER register in pcie_do_recovery() Signed-off-by: Naveen Naidu --- drivers/pci/pcie/err.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/err.c b/driver

[PATCH v4 8/8] PCI/AER: Include DEVCTL in aer_print_error()

2021-10-05 Thread Naveen Naidu
: severity=Corrected, type=Data Link Layer, (Receiver) pcieport :00:03.0: device [1b36:000c] error status/mask=0040/e000, devctl=0x000f <-- devctl added to the error log pcieport :00:03.0:[ 6] BadTLP Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 2 ++ driv

[PATCH 00/22] PCI: Unify PCI error response checking

2021-10-11 Thread Naveen Naidu
_ERROR_RESPONSE alsong with 0x, so that it becomes easier to grep for faulty hardware reads. Thanks, Naveen Naveen Naidu (22): [PATCH 1/22] PCI: Add PCI_ERROR_RESPONSE and it's related defintions [PATCH 2/22] PCI: Unify PCI error response checking [PATCH 3/2

[PATCH 01/22] PCI: Add PCI_ERROR_RESPONSE and it's related defintions

2021-10-11 Thread Naveen Naidu
consistent and easier to find. Also add helper definitions SET_PCI_ERROR_RESPONSE and RESPONSE_IS_PCI_ERROR to make the code more readable. Signed-off-by: Naveen Naidu --- include/linux/pci.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h ind

[PATCH 17/22] PCI/DPC: Use RESPONSE_IS_PCI_ERROR() to check read from hardware

2021-10-11 Thread Naveen Naidu
helps unify PCI error response checking and make error checks consistent and easier to find. Compile tested only. Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c ind

[PATCH v2 00/24] Unify PCI error response checking

2021-10-15 Thread Naveen Naidu
grep for faulty hardware reads. Changelog = v2: - Instead of using SET_PCI_ERROR_RESPONSE in all controller drivers to fabricate error response, only use them in PCI_OP_READ and PCI_USER_READ_CONFIG Naveen Naidu (24): [PATCH 1/24] PCI: Add PCI_ERROR_RESPONSE and it's

[PATCH v2 00/24] Unify PCI error response checking

2021-10-15 Thread Naveen Naidu
grep for faulty hardware reads. Changelog = v2: - Instead of using SET_PCI_ERROR_RESPONSE in all controller drivers to fabricate error response, only use them in PCI_OP_READ and PCI_USER_READ_CONFIG Naveen Naidu (24): [PATCH 1/24] PCI: Add PCI_ERROR_RESPONSE and it's

Re: [PATCH v2 00/24] Unify PCI error response checking

2021-10-15 Thread Naveen Naidu
On 15/10, Naveen Naidu wrote: > An MMIO read from a PCI device that doesn't exist or doesn't respond > causes a PCI error. There's no real data to return to satisfy the > CPU read, so most hardware fabricates ~0 data. > > This patch series adds PCI_ERROR_RESPON

[PATCH v2 19/24] PCI/DPC: Use RESPONSE_IS_PCI_ERROR() to check read from hardware

2021-10-15 Thread Naveen Naidu
helps unify PCI error response checking and make error checks consistent and easier to find. Compile tested only. Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c ind

[PATCH v3 00/25] Unify PCI error response checking

2021-10-21 Thread Naveen Naidu
PCI_OP_READ and PCI_USER_READ_CONFIG Naveen Naidu (25): [Patch 1/25] PCI: Add PCI_ERROR_RESPONSE and it's related definitions [Patch 2/25] PCI: Set error response in config access defines when ops->read() fails [Patch 3/25] PCI: Use SET_PCI_ERROR_RESPONSE() when device not f

[PATCH v3 01/25] PCI: Add PCI_ERROR_RESPONSE and it's related definitions

2021-10-21 Thread Naveen Naidu
consistent and easier to find. Also add helper definitions SET_PCI_ERROR_RESPONSE and RESPONSE_IS_PCI_ERROR to make the code more readable. Suggested-by: Bjorn Helgaas Signed-off-by: Naveen Naidu --- include/linux/pci.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/

[PATCH v3 02/25] PCI: Set error response in config access defines when ops->read() fails

2021-10-21 Thread Naveen Naidu
always set when the controller drivers fails to read a config register from a device. This makes error response fabrication consistent and helps in removal of a lot of repeated code. Suggested-by: Rob Herring Reviewed-by: Rob Herring Signed-off-by: Naveen Naidu --- drivers/pci/access.c | 10

[PATCH v3 19/25] PCI/DPC: Use RESPONSE_IS_PCI_ERROR() to check read from hardware

2021-10-21 Thread Naveen Naidu
helps unify PCI error response checking and make error checks consistent and easier to find. Compile tested only. Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c ind

Re: [PATCH v4 1/8] PCI/AER: Remove ID from aer_agent_string[]

2021-10-21 Thread Naveen Naidu
On 20/10, Bjorn Helgaas wrote: > On Tue, Oct 05, 2021 at 10:48:08PM +0530, Naveen Naidu wrote: > > Currently, we do not print the "id" field in the AER error logs. Yet the > > aer_agent_string[] has the word "id" in it. The AER error log looks > > like: &

Re: [PATCH v4 3/8] PCI/DPC: Initialize info->id in dpc_process_error()

2021-10-21 Thread Naveen Naidu
On 20/10, Bjorn Helgaas wrote: > On Tue, Oct 05, 2021 at 10:48:10PM +0530, Naveen Naidu wrote: > > In the dpc_process_error() path, info->id isn't initialized before being > > passed to aer_print_error(). In the corresponding AER path, it is > > initialized in aer_

Re: [PATCH v4 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()

2021-10-21 Thread Naveen Naidu
On 20/10, Bjorn Helgaas wrote: > On Tue, Oct 05, 2021 at 10:48:11PM +0530, Naveen Naidu wrote: > > dpc_process_error() clears both AER fatal and non fatal status > > registers. Instead of clearing each status registers via a different > > function call use pci_aer_clear_st

Re: [PATCH v4 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers

2021-10-21 Thread Naveen Naidu
On 20/10, Bjorn Helgaas wrote: > [+cc Keith, Sinan, Oza] > > On Tue, Oct 05, 2021 at 10:48:12PM +0530, Naveen Naidu wrote: > > In the EDR path, AER registers are cleared *after* DPC error event is > > processed. The process stack in EDR is: > > > > edr_handle_

[PATCH v5 0/5] Fix long standing AER Error Handling Issues

2021-10-25 Thread Naveen Naidu
second issue - Patch 3 is depended on Patch 2 in the series PATCH 4 - Fixes the bug in clearing of AER registers which leades to AER message spew [1] PATCH 5: - Adds extra information (devctl register) in AER error logs. - Patch 5 depends on Patch 4 of the series Thanks, Naveen Naidu

[PATCH v5 1/5] PCI/AER: Remove ID from aer_agent_string[]

2021-10-25 Thread Naveen Naidu
pe=Data Link Layer, (Receiver) pcieport :00:03.0: device [1b36:000c] error status/mask=0040/e000 pcieport :00:03.0:[ 6] BadTLP Link: https://lore.kernel.org/linux-pci/20211021170317.GA2700910@bhelgaas/T/#m618bda4e54042d95a1a83fccc01cdb423f7590dc Signed-off-by: Naveen Nai

[PATCH v5 2/5] PCI: Cleanup struct aer_err_info

2021-10-25 Thread Naveen Naidu
/UNCOR Error Mask Since the length of the above registers are even, use u16 and u32 to represent their values. Also remove the __pad fields. "pahole" was run on the modified struct aer_err_info and the size remains unchanged. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 8 +

[PATCH v5 3/5] PCI/DPC: Initialize info.id in dpc_process_error()

2021-10-25 Thread Naveen Naidu
e passing it to aer_print_error() Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/dri

[PATCH v5 4/5] PCI/AER: Clear error device AER registers in aer_irq()

2021-10-25 Thread Naveen Naidu
clear the source of the interrupt. Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 13 ++- drivers/pci/pcie/aer.c | 249 - 2 files changed, 184 insertions(+), 78 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9b

[PATCH v5 5/5] PCI/AER: Include DEVCTL in aer_print_error()

2021-10-25 Thread Naveen Naidu
=0040/e000, devctl=0x000f <-- devctl added to the error log pcieport :00:03.0:[ 6] BadTLP Signed-off-by: Naveen Naidu --- drivers/pci/pci.h | 2 ++ drivers/pci/pcie/aer.c | 10 -- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.

Re: [PATCH v3 01/25] PCI: Add PCI_ERROR_RESPONSE and it's related definitions

2021-11-18 Thread Naveen Naidu
On 17/11, Bjorn Helgaas wrote: > On Thu, Oct 21, 2021 at 08:37:26PM +0530, Naveen Naidu wrote: > > An MMIO read from a PCI device that doesn't exist or doesn't respond > > causes a PCI error. There's no real data to return to satisfy the > > CPU read,

[PATCH v4 00/25] Unify PCI error response checking

2021-11-18 Thread Naveen Naidu
nstead of using SET_PCI_ERROR_RESPONSE in all controller drivers to fabricate error response, only use them in PCI_OP_READ and PCI_USER_READ_CONFIG Naveen Naidu (25): [PATCH v4 1/25] PCI: Add PCI_ERROR_RESPONSE and it's related definitions [PATCH v4 2/25] PCI: Set error respo

[PATCH v4 19/25] PCI/DPC: Use PCI_POSSIBLE_ERROR() to check read from hardware

2021-11-18 Thread Naveen Naidu
unify PCI error response checking and make error checks consistent and easier to find. Compile tested only. Signed-off-by: Naveen Naidu --- drivers/pci/pcie/dpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c ind