The feature is not applicable to specific app platform.
v2: update the disablement condition and commit description
v3: move the setting to amdgpu_ras_check_supported
Signed-off-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +
1 file changed, 5
The feature is not applicable to specific app platform.
v2: update the disablement condition and commit description
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers
The feature is unsupported on specific APU.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index dbfc41ddc3c7..d46f216a33b1 100644
Return RMA status without message print.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 96e525ab9a84
Instead of printing GPU reset failed.
v2: add check for reset_context->src.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/
In the convenience of calling it globally.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22 --
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 2 +-
4
Instead of printing GPU reset failed.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 355c2478c4b6
GFX v9.4.3 uses mode1 reset, other ASICs choose mode2.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
b/drivers/gpu/drm/amd/amdkfd
Per FW requirement, replace mode2 with mode1.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
The fed status does indicate RAS fatal error.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +-
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 2 +-
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 +-
3 files changed, 3 insertions(+), 3
Indicate fatal error for each RAS block and NBIO.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu
PMFW needs the flag to know the reason of mode1.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 2 +-
drivers
Set the flag to true if bad page number reaches threshold.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 7 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 10 ++
drivers/gpu/drm/amd/amdgpu
Check it in mode1 reset.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 32 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 +
.../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 2
Check RMA status in bad page retirement flow.
v2: fix coding bugs in v1.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 28 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 8 +++
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 4 +++-
3 files
Reduce redundant code and user doesn't need to pay attention to RAS
details.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 13 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Check RMA status in bad page retirement flow.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 7 +++
2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Set the flag to true if bad page number reaches threshold.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 7 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 10 ++
drivers/gpu/drm/amd/amdgpu
And also make sure the the value of msg[1].len should be in the range of u16.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
b/drivers/gpu/drm/amd
Avoid overflow issue.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c | 6 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.h | 4 ++--
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eeprom.c
b/drivers/gpu/drm/amd/amdgpu
RAS TA will handle it, the interface is useless.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 -
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 105 ++---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 62 +--
3 files changed, 7 insertions
Check more possibile ext error codes.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index f4be524b0dc1..be1f4efa9ef6
Add more possible ext error code.
v2: still use ext error code instead of UC bit.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b
Check UC bit instead of ext error code.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13
SDMA_CNTL is not set in some cases, driver configures it by itself.
v2: simplify code
Signed-off-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 16 +++-
1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/amd
SDMA_CNTL is not set in some cases, driver configures it by itself.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2
And set the socket id.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 1 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 14 +++---
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
b/drivers/gpu/drm/amd/amdgpu
Replace separate parameters with struct ta_ras_query_address_input.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 57 ++
1 file changed, 30 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd
Each RAS block has different requirement for gpu reset in poison
consumption handling.
Add support for mmhub RAS poison consumption handling.
v2: remove the mmhub poison support for kfd int v10.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 2 +-
drivers/gpu/drm
Support the query for both gfxhub and mmhub, also replace
xcc_id with hub_inst.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 17 -
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5
Add it for mmhub v1.8.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h | 2 ++
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 15 +++
2 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h
b/drivers/gpu/drm/amd/amdgpu
Each RAS block has different requirement for gpu reset in poison
consumption handling.
Add support for mmhub RAS poison consumption handling.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 +-
drivers/gpu/drm
Support the query for both gfxhub and mmhub, also replace
xcc_id with hub_inst.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 17 -
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3
Add it for mmhub v1.8.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h | 2 ++
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 15 +++
2 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h
b/drivers/gpu/drm/amd/amdgpu
Both RAS UE and deferred errors need page retirement.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 14ef7a24be7b
Let kfd interrupt handler process it.
v2: return 0 instead of 1 for fed error.
drop the usage of strcmp in interrupt handler.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd
Obtain it from ring entry.
v2: replace node id with logical xcc id.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 14 --
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 14 --
2 files changed, 24 insertions(+), 4 deletions(-)
diff
Replace it with related interface in gfxhub functions.
v2: replace node id with xcc id.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 1 -
drivers/gpu/drm
Add UCE and FED bit definitions.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
Implement it for gfxhub 1.0 and 1.2.
v2: input logical xcc id for poison query interface.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h | 2 ++
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 17 +
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 15
Obtain it from ring entry.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 3 ++-
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
b/drivers
Let kfd interrupt handler process it.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 773725a92cf1..70defc394b7b
Replace it with related interface in gfxhub functions.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 1 -
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c| 12
Implement it for gfxhub 1.0 and 1.2.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h | 2 ++
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 17 +
drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c | 15 +++
3 files changed, 34 insertions(+)
diff --git
Add UCE and FED bit definitions.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
Add help function to query and reset RAS UTCL2 poison status.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 14 ++
1 file changed, 14 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index
Get UMC physical address from PSP in RAS error address coversion.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 46 ++
1 file changed, 39 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd
Convert mca address to physical address or vice versa via RAS TA.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 25 +
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 +++
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 36 +
3 files changed
Send ras disable feature command in fini.
Signed-off-by: Tao Zhou
Change-Id: I95f1d1e0a46fb613631e5cd77497e64c0551c4c7
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
Support page retirement handling in debug mode.
v2: revert smu_v13_0_6_get_ecc_info directly.
Signed-off-by: Tao Zhou
Change-Id: I0aaa807d7fe87b3da0f023c380e57ab6dd446fcf
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a
This reverts commit affdce050ab4119a3cdf74d7faa8f1eb30f6f6aa.
We use debug mode flag instead of this interface.
Signed-off-by: Tao Zhou
Change-Id: I49eae821ce352d542143d68c05802634b4bf469d
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 8
1 file changed, 8 deletions
Support page retirement handling in debug mode.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 9 +++--
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 4 ++--
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd
Deferred error is also taken into account.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 10edf818acf5..2e0bd4312f2c
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to
print warning for it.
v2: add ret2 to save the status of psp_ras_trigger_error.
Suggested-by: lijo.la...@amd.com
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 14 --
1 file chang
Handle xgmi hive case.
Suggested-by: Hawking Zhang
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 ++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
The UE registe list is larger than CE list.
Reported-by: yipeng.c...@amd.com
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 38 +
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
b/drivers/gpu/drm/amd/amdgpu
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to
print warning for it.
Suggested-by: lijo.la...@amd.com
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/a
Reset/query RAS error status and count.
v2: use XGMI IP version instead of WAFL version.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 46 ++--
1 file changed, 43 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
The version can't be queried from discovery table.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 0b711ba
Switch from mode-1 reset to mode-2 for poison consumption.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index
Not all platforms support RAS.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index c71321edf50b..a6cff4a31c54
Add DF block and RAS poison mode query for DF v4_6_2.
Signed-off-by: Tao Zhou
Reviewed-by: Stanley.Yang
---
drivers/gpu/drm/amd/amdgpu/Makefile | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 +++
drivers/gpu/drm/amd/amdgpu/df_v4_6_2.c| 34
Enable it by default on APU platform.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 95c181cd1fea..a41cab0a2f9c
PMFW will be responsible for them.
v2: remove query interfaces.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 60 --
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 143
2 files changed, 203 deletions(-)
diff --git a/drivers/gpu/drm/amd
Call amdgpu_ras_set_mca_debug_mode when we set mca debug mode in smu
v13_0_6.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu
PMFW is responsible for RAS error reset in some conditions, driver can
skip the operation.
v2: add check for ras->in_recovery, it's set earlier than
amdgpu_in_reset.
v3: fix error in gpu reset check.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +-
Record the debug mode status in RAS.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 +
2 files changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
Simplify the code.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 9 ++---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 7 ++-
drivers/gpu/drm/amd
Make the code architecture more simple.
v2: reuse ras_reset_error_count in ras_reset_error_status.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 ++
2 files changed, 17 insertions(+), 4 deletions
PMFW will be responsible for it.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 22 ---
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 86 -
2 files changed, 108 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
b/drivers/gpu/drm
PMFW is responsible for RAS error reset in some conditions, driver can
skip the operation.
v2: add check for ras->in_recovery, it's set earlier than
amdgpu_in_reset.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 ++--
1 file changed, 18 in
To simplify the code of amdgpu_ras_reset_error_status without logical
change.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 30 +++--
1 file changed, 8 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu
Record the debug mode status in RAS.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 +
2 files changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
Call amdgpu_ras_set_mca_debug_mode when we set mca debug mode in smu
v13_0_6.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu
Make the code architecture more simple.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 17 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4
To simplify the code of amdgpu_ras_reset_error_status without logical
change.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 29 +++--
1 file changed, 8 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu
PMFW is responsible for RAS error reset in some conditions, driver can
skip the operation.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 18 --
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b
Call amdgpu_ras_set_mca_debug_mode when we set mca debug mode in smu
v13_0_6.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu
Record the debug mode status in RAS.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 +
2 files changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
Make the code architecture more simple.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 17 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4
No need to perform the full reset operation in case of gpu reset
failure.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu
Increase the retry loops and replace the constant number with macro.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
index
: replace sizeof with BITS_PER_TYPE, we should check bit number
instead of byte number.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
b/drivers/gpu
Prepare for bad page retirement.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 +++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 2 ++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0
Print channel index for UMC v12.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index c6742dd863d4..7714c2ef2cdc
Prepare for bad page retirement.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 +++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 2 ++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0
.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 8ced4be784e0..1c4433f22f4b 100644
--- a/drivers
Print channel index for UMC v12.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index c6742dd863d4..7714c2ef2cdc
Print out row, column and bank value of UMC error address for UMC v12.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu
Get UMC phyical channel index according to node id, umc instance and
channel instance.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 14 ++
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 5 +
3 files changed, 20
Convert MCA error address to physical address and find out all pages in
one physical row.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 5 ++
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 97 -
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 64
Instead of using direct update, avoid touching unrelated fields.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index
Register RAS fatal error interrupt and add handler.
v2: only register NBIO RAS for dGPU platform.
change nbio_v7_9_set_ras_controller_irq_state and
nbio_v7_9_set_ras_err_event_athub_irq_state
to dummy functions.
Signed-off-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm
Register RAS fatal error interrupt and add handler.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +
drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 219
drivers/gpu/drm/amd/amdgpu/nbio_v7_9.h | 1 +
3 files changed, 224 insertions(+)
diff
Configure SQ watchdog timer setting.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 38 +
1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 9e3b835bdbb2
The address parameter of GFX RAS injection isn't related to XGMI node
number, keep it unchanged.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gp
No RAS irq is allowed.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
b/drivers/gpu/drm/amd/amdgpu
mmhub_v1_8_mmea_cgtt_clk_cntl_reg is defined but not used.
Reported-by: kernel test robot
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 8
1 file changed, 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
b/drivers/gpu/drm/amd/amdgpu
bad_page_threshold controls page retirement behavior and it should be
also checked.
v2: simplify the condition of bad page handling path.
Signed-off-by: Tao Zhou
---
.../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 19 ++-
1 file changed, 14 insertions(+), 5 deletions(-)
diff
Ignore ras umc bad page threshold by default, GPU initialization won't
be stopped in this mode.
v2: refine the description of bad_page_threshold.
Signed-off-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
1 - 100 of 284 matches
Mail list logo