When a gpu in hive is performing ras reset, other
gpus in hive do not need to schedule recovery work
to reset the gpu.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 +++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.
v2:
Put the same code into a function and reuse the function.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 35 -
1 file changed, 29 insertions(+), 6 deletions
:
1. Add the above description to code comments.
2. Reuse existing function.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 18 ++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git
.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 14 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 6 ++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 --
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 3 +++
3 files changed, 5 insertions(+), 3 deletions(-)
diff
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Sysfs node disable query error count during gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Add mutex to protect ras shared memory.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 124 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +
3 files changed, 87 insertions(+), 40 deletions
Add gpu reset check and exception handling for
page retirement.
v2:
Clear poison consumption messages cached in fifo after
non mode-1 reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 52 +
1 file changed, 52 insertions(+)
diff --git
-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
1. The poison fifo is only used for poison consumption
requests.
2. Merge reset requests when poison fifo caches multiple
poison consumption messages
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 56 -
drivers/gpu/drm/amd/amdgpu
Add variable to record the deferred error
number read by driver.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 62 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 4 +-
3 files changed, 48
Add completion to wait for ras reset to complete.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm
Add gpu reset check and exception handling for
page retirement.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 +
1 file changed, 43 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu
1. The poison fifo is only used for poison consumption
requests.
2. Merge reset requests when poison fifo caches multiple
poison consumption messages
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 58 +
drivers/gpu/drm/amd/amdgpu
-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 41 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Add variable to record the deferred error
number read by driver.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 62 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 4 +-
3 files changed, 48
If gpu is recovering, clear all message reset flags
in fifo and wait for gpu to complete recovery.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12
1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm
If the number of messages to be processed in the fifo exceeds
the threshold, it will not continue to wait for the DE data
to be ready.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 13 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 +++-
2 files changed
Add completion to wait for gpu to complete reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm
To avoid resetting the gpu repeatedly, clear all
message reset flags in the fifo before the first
gpu reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 59 -
1 file changed, 58 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd
1. Cannot add messages to fifo in gpu reset mode.
2. Only when the message is successfully saved to the
fifo, the thread can be awakened.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 18
Change log level.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Add mutex to protect ras shared memory.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 121 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +
3 files changed, 84 insertions(+), 40 deletions
Remove redundant function call.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22 ++
1 file changed, 6 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Remove unused code.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 69 --
1 file changed, 69 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index 8df84feaf046..12bae67be91c 100644
Fix ras mode2 reset failure in ras aca mode.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index edb3cd0cef96..11a70991152c 100644
Fix ras mode2 reset failure in ras aca mode for
sdma v4_4_2 and gfx v9_4_3.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 4
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4
2 files changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
Use new interface to reserve bad page.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d1a2ab944b7d
retired_page is page frame and should be expanded
to the full address when querying status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
support ACA logging ecc errors.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index bd917eb6ea24..8df84feaf046 100644
--- a/drivers
Add poison consumption handler.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 ++---
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Prepare to handle pasid poison consumption.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 9 -
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 20 ---
drivers/gpu/drm/amd/amdgpu
Add condition check for amdgpu_umc_fill_error_record.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 20 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
3 files changed, 19 insertions(+), 4 deletions
1. umc v12_0 logs ecc errors.
2. Reserve newly detected ecc error pages.
3. Add tag for bad pages, so that they can
be retired later.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 67 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 7
Retire bad pages for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 57 +-
1 file changed, 55 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
index
Add delay work to retire bad pages.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 36 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 3 +++
4 files
Umc v12_0 converts error address.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 94 +-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h | 12
2 files changed, 105 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
b
Add interface to update umc v12_0 ecc status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 9 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 6 +
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c
Add poison creation handler.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 74 +++--
1 file changed, 69 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index
Add interface to reserve bad page.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4
2 files changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd
Prepare for logging ecc errors.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 33 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 23 +
2 files changed, 56 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b
Add message fifo to handle RAS poison events.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 32 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 18 ++
2 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
add new nodes for the addresses that are not in the
reserved_pages list and reservations_pending list.
V2:
Avoid repeated locking/unlocking.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 25 +---
1 file changed, 16 insertions(+), 9 deletions
add new nodes for the addresses that are not in the
reserved_pages list and reservations_pending list.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 28 +---
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
Need to resume ras during gpu reset for
gfx v9_4_3 sriov
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index afc0b4eb7f8e
/0x80
[ 484.496866] ? exc_page_fault+0x87/0x170
[ 484.496868] ? asm_exc_page_fault+0x8/0x30
[ 484.496871] entry_SYSCALL_64_after_hwframe+0x44/0xae
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git
Use asynchronous polling to handle umc_v12_0 poisoning.
v2:
1. Change function name.
2. Change the debugging information content.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 139 ++--
drivers
Add interface to check mca umc status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 12 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 +++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c| 20
Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 8 ++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 66
Preparing for asynchronous processing of umc page retirement.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 34 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5
2 files changed, 39 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
Add log info for umc_v12_0 and smu_v13_0_6.
v2:
Delete redundant logs.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 11 +++
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 6 +-
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm
Add log info for umc_v12_0 and smu_v13_0_6.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 11 +++
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 6 +-
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c| 13 +
3 files
Preparing for asynchronous processing of umc page retirement.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 34 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5
2 files changed, 39 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 43 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 8 ++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 66
Add interface to check mca umc status.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 12 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 4 +++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c| 20
Use asynchronous polling to handle umc_v12_0 poisoning.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 143 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 3 +
3 files changed, 120 insertions(+), 31
MCA supports recording umc address information.
V2:
Move err_addr variable from struct ras_err_node to
struct ras_err_info.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 13 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 22
Add umc page retirement for umc v12_0.
V2:
1. Changed umc page retirement check condition
to call umc_v12_0_is_uncorrectable_error.
2. Use memset to clear the contents of the umc
error address structure.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 56
smu v13_0_6 supports ecc info by default.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 8
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
Add poison mode check error condition for umc v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v12_0.c| 20 ++-
drivers/gpu/drm/amd/amdgpu/umc_v12_0.h| 4 ++--
.../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 4 ++--
3 files changed, 19
Support saving bad pages after gpu ras reset for umc_v12_0.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 40 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 35 ++
drivers/gpu/drm
Enable ras for mp0 v13_0_6 sriov
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 7689395e44fd..378478cf9c21 100644
--- a/drivers
Mode1 reset needs to recover mp1 in fatal error case
for mp0 v13_0_10.
v2:
Define a macro to wrap psp function calls.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 ++
drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
Mode1 reset needs to recover mp1 in fatal error case
for mp0 v13_0_10.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +++
drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 24 +++-
3 files changed, 27
Fix incorrect vmhub index.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index d04fc0f19a29..c0b588e5d6aa 100644
Fix printing empty string array.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index c571f0d95994..d04fc0f19a29
not update the same version ras ta.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 20 +++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
index
Add ta initialization failure check condition.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
index
Fatal error occurs in ras poison mode, mode1 reset
is used to recover gpu.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu
The link object of mgr->reserved_pages is the blocks
variable in struct amdgpu_vram_reservation, not the
link variable in struct drm_buddy_block.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --
perform mode2 reset for sdma fed error on gfx v11_0_3.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 +
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 14 +-
3 files changed, 25 insertions(+), 2
When testing sdma ib ring fails to detect sdma
hang for sdma fed error, force to perform soft
reset.
V2:
Add poison mode support check for special code
path.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 19 +++
1 file changed, 19 insertions
When testing sdma ib ring fails to detect sdma
hang for sdma fed error, force to perform soft
reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 16
1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
b/drivers/gpu
When gfx ras poison consumption causes gpu reset on gfx v11_0_3,
the sequence of gpu reset is "soft reset -> mode2 reset -> mode1 reset".
If the previous reset fails, fall back to the next reset.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdg
Add variable to record gpu reset reason.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 3 +++
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 6 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
b/drivers/gpu/drm
: recover vram bo from shadow done
[ 390.931067] amdgpu :63:00.0: amdgpu: GPU reset(1) succeeded!
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++
drivers/gpu/drm/amd/amdgpu
Add gfx v11_0_3 fed irq handling for sriov.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c | 14 +++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0_3.c
b/drivers/gpu/drm/amd
Optimize redundant code in umc_v6_7.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 162 +++---
1 file changed, 71 insertions(+), 91 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
index
Optimize redundant code in umc_v8_10
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 31
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 7 +
drivers/gpu/drm/amd/amdgpu/umc_v8_10.c | 197 +---
3 files changed, 115 insertions(+), 120 deletions
Reinit mes ip block during reset on SRIOV.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index
Gfx v11_0_3 supports ras on SRIOV, so need to resume ras
during reset.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b
Enable ras for mp0 v13_0_10 on SRIOV.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 63dfcc98152d
Optimize sdma ras block initialization code for sdma v4_0.
Signed-off-by: YiPeng Chai
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 21 +
1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
b/drivers
Add sdma ras function on sdma v6_0_3.
Signed-off-by: YiPeng Chai
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 35
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 +
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 24
3 files changed
that the ras block supports ras function.
Signed-off-by: YiPeng Chai
Reviewed-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu
check. Even if the ras block
checked is not in the ras list, it will return a null
pointer and will have no effect.
Signed-off-by: YiPeng Chai
Reviewed-by: Tao Zhou
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ---
1 file changed, 3 deletions(-)
diff --git
, the .hw_ops null
pointer check in amdgpu_ras_interrupt_poison_consumption_handler
needs to be adjusted.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4
V2:
Optimize gfx_v11_0_set_cp_ecc_error_state function.
V3:
Define macro constant for me pipe instance address interval.
V5:
Register and handle gfx cp ecc error irq on gfx v11_0_3.
V6:
Remove invalid intermediate function call.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
Add gfx ras poison consumption irq handling on gfx v11_0_3.
V2:
Move ras poison consumption irq handling code of gfx
v11_0_3 to gfx_v11_0_3.c.
V5:
Create dedicated irq handler for RLC_GC_FED_INTERRUPT.
V6:
Remove invalid function call.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking
V2:
Add RLC_RLCS_FED_STATUS_0 and RLC_RLCS_FED_STATUS_1 register
offset and shift masks.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
Reviewed-by: Tao Zhou
Reviewed-by: Alex Deucher
---
.../include/asic_reg/gc/gc_11_0_3_offset.h| 8 +++
.../include/asic_reg/gc
gfx_v11_0_3_ras_ops.
V4:
Revert changes in amdgpu_ras_interrupt_poison_consumption_handler.
V5:
1. Remove invalid include file in gfx_v11_0_3.c.
2. Reduce the number of parameters of amdgpu_gfx_ras_sw_init.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
Reviewed-by: Tao Zhou
---
drivers/gpu/drm
853540] __x64_sys_delete_module+0x142/0x260
[ 304.853548] ? exit_to_user_mode_prepare+0x3e/0x190
[ 304.853555] do_syscall_64+0x38/0x90
[ 304.853562] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
1 file chan
The patch is enabling mode-1 reset for RAS recovery in fatal error mode.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
Reviewed-by: Tao Zhou
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 7 ++-
2 files changed, 10 insertions
Amdgpu_ras_set_error_query_ready is called at the start of
amdgpu_device_gpu_recover to disable query ras error, but the
code behind only enables query ras error in full reset path,
but not in soft reset path, emergency restart path and skip
the hardware reset path.
Signed-off-by: YiPeng Chai
Add umc channel index mapping table for umc_v8_10.
Signed-off-by: YiPeng Chai
Reviewed-by: Hawking Zhang
---
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 -
drivers/gpu/drm/amd/amdgpu/umc_v8_10.c | 10 ++
drivers/gpu/drm/amd/amdgpu/umc_v8_10.h | 4
3 files changed, 18
.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index e55f106621ef..3deb716710e6 100644
--- a/drivers/gpu/drm/amd/amdgpu
condition checks, so the first
conditional check in amdgpu_pm_sysfs_fini can
be removed.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 5e318b3f6c0f
pu_device_gpu_recover,
then amdgpu_fill_buffer will not be called when psp_suspend is
called.
2. Free psp ring memory in psp_sw_fini.
Signed-off-by: YiPeng Chai
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 -
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 1 +
drivers/gpu/drm/am
1 - 100 of 111 matches
Mail list logo