[PATCH 2/2] drm/amdkfd: use cache GTT buffer for PQ and wb pool

2024-11-05 Thread Victor Zhao
From: Monk Liu As cache GTT buffer is snooped, this way the coherence between CPU write and GPU fetch is guaranteed, but original code uses WC + unsnooped for HIQ PQ(ring buffer) which introduces coherency issues: MEC fetches a stall data from PQ and leads to MEC hang. Signed-off-by: Monk Liu -

[PATCH 1/2] drm/amdgpu: allow function to allocate normal GTT memory

2024-11-05 Thread Victor Zhao
: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 7 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c| 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdgpu: skip amdgpu_device_cache_pci_state under sriov

2024-10-27 Thread Victor Zhao
amdgpu_device_cache_pci_state for sriov. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 383bbee87df5..64622dc57a6b 100644 --- a/drivers

[PATCH] drm/amdgpu: skip pci_restore_state under sriov during device init

2024-10-23 Thread Victor Zhao
programmed and leading to missing interrupts. So skip pci_restore_state during device init. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd

[PATCH] drm/amdkfd: remove extra use of volatile

2024-10-22 Thread Victor Zhao
as the adding of mb() should be sufficient in function unmap_queues_cpsch, remove the add of volatile type as recommended Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- 2 files changed, 2

[PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr

2024-10-17 Thread Victor Zhao
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status called, to avoid qcm fence timeout caused by incorrect ordering. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- 2

[PATCH] drm/amd/amdgpu: move drain_workqueue before shutdown is set

2024-08-25 Thread Victor Zhao
ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/

[PATCH] drm/amd/amdgpu: allow use kiq to do hdp flush under sriov

2024-08-18 Thread Victor Zhao
when use cpu to do page table update under sriov runtime, since mmio access is blocked, kiq has to be used to flush hdp. change WREG32_NO_KIQ to WREG32 to allow kiq. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c | 2

[PATCH] drm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT

2024-08-07 Thread Victor Zhao
, and may mess up the following reinit sequence on other gpus. So extend the time to cover the maximum time needed to recover. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw

2024-05-20 Thread Victor Zhao
the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance. Fix the miss matched code. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 18 +- 2 files changed, 11 insertions(+), 11

[PATCH] drm/amd/amdgpu: fix the inst passed to reg read write under sriov

2024-05-20 Thread Victor Zhao
the inst passed to reg read/write should be physical instance. Fix the miss matched code. Signed-off-by: Victor Zhao --- .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 ++--- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8

[PATCH] drm/amd/pm: Disallow managing power profiles on SRIOV for gc11.0.3

2023-09-25 Thread Victor Zhao
disable pp_power_profile_mode for sriov on gc11.0.3 as not supported by smu Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index

[PATCH] drm/amdgpu: fix for suspend/resume sequence under sriov

2022-11-02 Thread Victor Zhao
- clear kiq ring after suspend/resume under sriov to aviod kiq ring test failure - update irq after resume to fix kiq interrput loss Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 ++ 2 files changed, 4 insertions

[PATCH 2/3] Revert "drm/amdgpu: let mode2 reset fallback to default when failure"

2022-10-13 Thread Victor Zhao
This reverts commit 3efc702897c54c95c332632157ab042e942512c7. This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +--

[PATCH 3/3] drm/amdgpu: Refactor mode2 reset logic for v11.0.7

2022-10-13 Thread Victor Zhao
- refactor mode2 on v11.0.7 to align with aldebaran - comment out using mode2 reset as default for now, will introduce another controller to replace previous reset_level_mask Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 23 ++--- 1 file changed

[PATCH 1/3] Revert "drm/amdgpu: add debugfs amdgpu_reset_level"

2022-10-13 Thread Victor Zhao
This reverts commit 3ae992d5e1194a16e3d977076eb5722fa6e410d8. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 -- drivers/gpu

[PATCH 2/2] drm/amdgpu: move enable irq later to avoid race with ih resume

2022-09-14 Thread Victor Zhao
after ih resumed and before ib test. Adjusting the position of enable irq on other reset paths accordingly. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 1 + 2 files changed, 5 insertions(+), 4 deletions

[PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-14 Thread Victor Zhao
overflow happens Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 6 +- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm

[PATCH v2 3/6] drm/amdgpu: add debugfs amdgpu_reset_level

2022-07-28 Thread Victor Zhao
Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd

[PATCH v2 4/6] drm/amdgpu: save and restore gc hub regs

2022-07-28 Thread Victor Zhao
Save and restore gfxhub regs as they will be reset during mode 2 Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 26 +++ drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 72 +++ drivers/gpu

[PATCH v2 5/6] drm/amdgpu: revert context to stop engine before mode2 reset

2022-07-28 Thread Victor Zhao
For some hang caused by slow tests, engine cannot be stopped which may cause resume failure after reset. In this case, force halt engine by reverting context addresses Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h

[PATCH v2 6/6] drm/amdgpu: reduce reset time

2022-07-28 Thread Victor Zhao
-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 11 +-- 5 files

[PATCH v2 2/6] drm/amdgpu: let mode2 reset fallback to default when failure

2022-07-28 Thread Victor Zhao
- introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7

[PATCH v2 1/6] drm/amdgpu: add mode2 reset for sienna_cichlid

2022-07-28 Thread Victor Zhao
To meet the requirement for multi container usecase which needs a quicker reset and not causing VRAM lost, adding the Mode2 reset handler for sienna_cichlid. v2: move skip mode2 flag part separately Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers

[PATCH 5/5] drm/amdgpu: reduce reset time

2022-07-22 Thread Victor Zhao
In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

[PATCH 4/5] drm/amdgpu: revert context to stop engine before mode2 reset

2022-07-22 Thread Victor Zhao
For some hang caused by slow tests, engine cannot be stopped which may cause resume failure after reset. In this case, force halt engine by reverting context addresses Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h

[PATCH 2/5] drm/amdgpu: add debugfs amdgpu_reset_level

2022-07-22 Thread Victor Zhao
Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 drivers/gpu/drm/amd/amdgpu

[PATCH 3/5] drm/amdgpu: save and restore gc hub regs

2022-07-22 Thread Victor Zhao
Save and restore gfxhub regs as they will be reset during mode 2 Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 26 +++ drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 72 +++ drivers/gpu

[PATCH 1/5] drm/amdgpu: add mode2 reset for sienna_cichlid

2022-07-22 Thread Victor Zhao
handler for sienna_cichlid - introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 1 + drivers/gpu/drm/amd/amdgpu

[PATCH 4/4] drm/amdgpu: add vcn v3_0 soft reset

2022-03-09 Thread Victor Zhao
add soft reset sequence for vcn v3_0 Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 87 ++- 1 file changed, 85 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index

[PATCH 2/4] drm/amdgpu: pass job to check soft reset

2022-03-09 Thread Victor Zhao
In order to get more accurate engine hang detection, pass the hang job to check_soft_reset to find the hang engine instead of check register status. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2

[PATCH 3/4] drm/amdgpu: add sdma v5_2 soft reset

2022-03-09 Thread Victor Zhao
enable sdma v5_2 soft reset Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 79 +- 1 file changed, 78 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index 4d4d1aa51b8a

[PATCH 1/4] drm/amdgpu: add param soft_reset_enable

2022-03-09 Thread Victor Zhao
add parameter soft_reset_enable to control the enablement of soft reset Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 + drivers/gpu/drm/amd/amdgpu/nv.c | 5 - 3 files changed, 14 insertions(+), 1

[PATCH 0/4] Patchset to enable soft reset

2022-03-09 Thread Victor Zhao
to maintain the previous reset logic, add a module parameter to control soft reset. Victor Zhao (4): drm/amdgpu: add param soft_reset_enable drm/amdgpu: pass job to check soft reset drm/amdgpu: add sdma v5_2 soft reset drm/amdgpu: add vcn v3_0 soft reset drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: add determine passthrough under arm64

2022-01-23 Thread Victor Zhao
add determine for passthrough mode under arm64 by reading CurrentEL register Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h index

[PATCH] drm/ttm: add workaround for some arm hardware issue

2021-12-21 Thread Victor Zhao
with some specific arm based cpu, adding a ttm parameter to control. Signed-off-by: Victor Zhao --- drivers/gpu/drm/ttm/ttm_module.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c index e87f40674a4d

[PATCH] drm/amdgpu: fix r initial values

2021-04-27 Thread Victor Zhao
Sriov gets suspend of IP block failed as return value was not initialized. v2: return 0 directly to align original code semantic before this was broken out into a separate helper function instead of setting initial values Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: fix r initial values

2021-04-27 Thread Victor Zhao
Give initial values otherwise sriov will get suspend of IP block failed Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu: Add sdma single packet invalidation

2021-04-22 Thread Victor Zhao
Add sdma single packet invalidation Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 8 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 4 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 14 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 12

[PATCH] drm/amdgpu/sriov: Remove clear vf fw support

2021-04-22 Thread Victor Zhao
PSP clear_vf_fw feature is outdated and has been removed. Remove the related functions. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 32 - drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 1 - 2 files changed, 33 deletions(-) diff --git a/drivers

[PATCH v2] drm/amdgpu/psp: fix psp gfx ctrl cmds

2021-01-04 Thread Victor Zhao
psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux, according to psp, linux cmds are not correct. v2: only correct GFX_CTRL_CMD_ID_CONSUME_CMD. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[PATCH] drm/amdgpu/psp: fix psp gfx ctrl cmds

2021-01-04 Thread Victor Zhao
psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux, according to psp, linux cmds are not correct Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdgpu/psp_gfx_if.h | 26 + 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm