From: Monk Liu
As cache GTT buffer is snooped, this way the coherence between CPU write
and GPU fetch is guaranteed, but original code uses WC + unsnooped for
HIQ PQ(ring buffer) which introduces coherency issues:
MEC fetches a stall data from PQ and leads to MEC hang.
Signed-off-by: Monk Liu
-
: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_device.c| 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
drivers/gpu/drm/amd/amdkfd
amdgpu_device_cache_pci_state for sriov.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 383bbee87df5..64622dc57a6b 100644
--- a/drivers
programmed and leading to missing interrupts.
So skip pci_restore_state during device init.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd
as the adding of mb() should be sufficient in function unmap_queues_cpsch,
remove the add of volatile type as recommended
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
2 files changed, 2
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
2
ih process and
before enter full access under sriov to avoid full access time cost.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/
when use cpu to do page table update under sriov runtime, since mmio
access is blocked, kiq has to be used to flush hdp.
change WREG32_NO_KIQ to WREG32 to allow kiq.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c | 2
, and
may mess up the following reinit sequence on other gpus.
So extend the time to cover the maximum time needed to recover.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance.
Fix the miss matched code.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 18 +-
2 files changed, 11 insertions(+), 11
the inst passed to reg read/write should be physical instance.
Fix the miss matched code.
Signed-off-by: Victor Zhao
---
.../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 ++---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8
disable pp_power_profile_mode for sriov on gc11.0.3 as not supported
by smu
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index
- clear kiq ring after suspend/resume under sriov to aviod kiq ring
test failure
- update irq after resume to fix kiq interrput loss
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 ++
2 files changed, 4 insertions
This reverts commit 3efc702897c54c95c332632157ab042e942512c7.
This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +--
- refactor mode2 on v11.0.7 to align with aldebaran
- comment out using mode2 reset as default for now, will introduce
another controller to replace previous reset_level_mask
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 23 ++---
1 file changed
This reverts commit 3ae992d5e1194a16e3d977076eb5722fa6e410d8.
This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 --
drivers/gpu
after ih resumed and before ib test.
Adjusting the position of enable irq on other reset paths accordingly.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8
drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 1 +
2 files changed, 5 insertions(+), 4 deletions
overflow
happens
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 6 +-
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm
Introduce amdgpu_reset_level debugfs in order to help debug and
test specific type of reset. Also helps blocking unwanted type of
resets.
By default, mode2 reset will not be enabled
v2: make this debugfs in adev and use debugfs_create_u32
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd
Save and restore gfxhub regs as they will be reset during mode 2
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h| 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 26 +++
drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 72 +++
drivers/gpu
For some hang caused by slow tests, engine cannot be stopped which
may cause resume failure after reset. In this case, force halt
engine by reverting context addresses
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 +
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 11 +--
5 files
- introduce AMDGPU_SKIP_MODE2_RESET flag
- let mode2 reset fallback to default reset method if failed
v2: move this part out from the asic specific part
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7
To meet the requirement for multi container usecase which needs
a quicker reset and not causing VRAM lost, adding the Mode2
reset handler for sienna_cichlid.
v2: move skip mode2 flag part separately
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/Makefile | 2 +-
drivers
In multi container use case, reset time is important, so skip ring
tests and cp halt wait during ip suspending for reset as they are
going to fail and cost more time on reset
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
For some hang caused by slow tests, engine cannot be stopped which
may cause resume failure after reset. In this case, force halt
engine by reverting context addresses
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
Introduce amdgpu_reset_level debugfs in order to help debug and
test specific type of reset. Also helps blocking unwanted type of
resets.
By default, mode2 reset will not be enabled
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4
drivers/gpu/drm/amd/amdgpu
Save and restore gfxhub regs as they will be reset during mode 2
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h| 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 26 +++
drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 72 +++
drivers/gpu
handler for sienna_cichlid
- introduce AMDGPU_SKIP_MODE2_RESET flag
- let mode2 reset fallback to default reset method if failed
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/Makefile | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 1 +
drivers/gpu/drm/amd/amdgpu
add soft reset sequence for vcn v3_0
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 87 ++-
1 file changed, 85 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index
In order to get more accurate engine hang detection, pass the hang
job to check_soft_reset to find the hang engine instead of check
register status.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2
enable sdma v5_2 soft reset
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 79 +-
1 file changed, 78 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 4d4d1aa51b8a
add parameter soft_reset_enable to control the enablement of
soft reset
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +
drivers/gpu/drm/amd/amdgpu/nv.c | 5 -
3 files changed, 14 insertions(+), 1
to maintain the previous reset logic, add a module
parameter to control soft reset.
Victor Zhao (4):
drm/amdgpu: add param soft_reset_enable
drm/amdgpu: pass job to check soft reset
drm/amdgpu: add sdma v5_2 soft reset
drm/amdgpu: add vcn v3_0 soft reset
drivers/gpu/drm/amd/amdgpu
add determine for passthrough mode under arm64 by reading
CurrentEL register
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index
with some specific arm based cpu, adding
a ttm parameter to control.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/ttm/ttm_module.c | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c
index e87f40674a4d
Sriov gets suspend of IP block failed as return
value was not initialized.
v2: return 0 directly to align original code semantic before this
was broken out into a separate helper function instead of setting
initial values
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu
Give initial values otherwise sriov will get suspend of
IP block failed
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
b/drivers/gpu/drm/amd/amdgpu
Add sdma single packet invalidation
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 8
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 4
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 14 +-
drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 12
PSP clear_vf_fw feature is outdated and has been removed.
Remove the related functions.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 32 -
drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 1 -
2 files changed, 33 deletions(-)
diff --git a/drivers
psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux,
according to psp, linux cmds are not correct.
v2: only correct GFX_CTRL_CMD_ID_CONSUME_CMD.
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux,
according to psp, linux cmds are not correct
Signed-off-by: Victor Zhao
---
drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 26 +
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm
42 matches
Mail list logo