[PATCH] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-06-07 Thread Jack Zhang
the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd

[PATCH] drm/amd/amdgpu/sriov disable all ip hw status by default

2021-04-27 Thread Jack Zhang
Disable all ip's hw status to false before any hw_init. Only set it to true until its hw_init is executed. The old 5.9 branch has this change but somehow the 5.11 kernrel does not have this fix. Without this change, sriov tdr have gfx IB test fail. Signed-off-by: Jack Zhang --- drivers/gpu

[PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Jack Zhang
No need to config GECC feature here for sriov Leave the host drvier to do the configuration job. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm

[PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-14 Thread Jack Zhang
re-insert Bailing jobs to avoid memory leak. V2: move re-insert step to drm/scheduler logic V3: add panfrost's return value for bailing jobs in case it hits the memleak issue. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- drivers/gpu/drm/amd/amdgpu

[PATCH v2] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-11 Thread Jack Zhang
re-insert Bailing jobs to avoid memory leak. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++-- drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- drivers/gpu/drm/scheduler/sched_main.c | 8

[PATCH] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-11 Thread Jack Zhang
re-insert Bailing jobs to avoid memory leak. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++-- drivers/gpu/drm/scheduler/sched_main.c | 8 +++- include/drm/gpu_scheduler.h| 1 + 4

[PATCH v8] drm/amd/amdgpu implement tdr advanced mode

2021-03-11 Thread Jack Zhang
that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 74 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +- drivers/gpu/drm/scheduler

[PATCH v8] drm/amd/amdgpu implement tdr advanced mode

2021-03-11 Thread Jack Zhang
that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 74 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +- drivers/gpu/drm/scheduler

[PATCH v7] drm/amd/amdgpu implement tdr advanced mode

2021-03-10 Thread Jack Zhang
that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +- drivers/gpu/drm/scheduler

[PATCH v6] drm/amd/amdgpu implement tdr advanced mode

2021-03-10 Thread Jack Zhang
that, we would do the normal resubmit step to resubmit left jobs. 2. Re-insert Bailing job to mirror_list, and leave it to be handled by the main reset thread. 3. For whole gpu reset(vram lost), do resubmit as the old way. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 72

[PATCH v4] drm/amd/amdgpu implement tdr advanced mode

2021-03-10 Thread Jack Zhang
that, we would do the normal resubmit step to resubmit left jobs. 2. Re-insert Bailing job to mirror_list, and leave it to be handled by the main reset thread. 3. For whole gpu reset(vram lost), do resubmit as the old way. Signed-off-by: Jack Zhang Change-Id: I408357f10b9034caaa1b83610e19e514c5fbaaf2

[PATCH v3] drm/amd/amdgpu implement tdr advanced mode

2021-03-08 Thread Jack Zhang
resubmit as the old style. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 57 -- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 33 + 3 files changed, 88 insertions(+), 4 deletions

[PATCH v2] drm/amd/amdgpu implement tdr advanced mode

2021-03-06 Thread Jack Zhang
e ring's mirror_list that has valid sched jobs. V2: -fix a cherry-pick mistake for bailing TDR handling. -do affinity_group check according to the bad job's sched rather than the default "1" so that there could be multiple affinity groups being pre-defined in futu

[PATCH] drm/amd/amdgpu implement tdr advanced mode

2021-03-06 Thread Jack Zhang
e ring's mirror_list that has valid sched jobs. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 101 - drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 47 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.h

[PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

2021-01-07 Thread Jack Zhang
reset notification from pf, stop data exchange. Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 + drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c| 1 + drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c| 1 + 3 files changed, 3 insertions(+) diff --git

[PATCH 5/5] drm/amd/sriov skip vcn powergating and dec_ring_test

2020-07-14 Thread Jack Zhang
1.Skip decode_ring test in VF, because VCN in SRIOV does not support direct register read/write. 2.Skip powergating configuration in hw fini because VCN3.0 SRIOV doesn't support powergating. V2: delete unneccessary white lines and refine implementation. Signed-off-by: Jack Zhang --- drivers

[PATCH 3/5] drm/amd/sriov add mmsch_v3 interface

2020-07-13 Thread Jack Zhang
For VCN3.0 SRIOV, Guest driver needs to communicate with mmsch to set the World Switch for MM appropriately. This patch add the interface for mmsch_v3.0. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/mmsch_v3_0.h | 130 1 file changed, 130 insertions

[PATCH 5/5] drm/amd/sriov skip vcn powergating and dec_ring_test

2020-07-13 Thread Jack Zhang
1.Skip decode_ring test in VF, because VCN in SRIOV does not support direct register read/write. 2.Skip powergating configuration in hw fini because VCN3.0 SRIOV doesn't support powergating. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 drivers/gpu/drm/amd

[PATCH 2/5] drm/amdgpu: optimize rlcg write for gfx_v10

2020-07-13 Thread Jack Zhang
For gfx10 boards, except for nv12, other boards take mmio write rather than rlcg write --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

[PATCH 4/5] drm/amd/sriov porting sriov cap to vcn3.0

2020-07-13 Thread Jack Zhang
. 4.Implementation for vcn_v3_0_start_sriov V2:Clean-up some uneccessary funciton declaration. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 350 +++--- 1 file changed, 318 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.

[PATCH 1/5] drm/amd/sriov skip jped ip block and close pgcg flags

2020-07-13 Thread Jack Zhang
For SIENNA_CICHLID SRIOV, jpeg and pgcp is not supported. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/nv.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index a7cfe3ac7cb6

[PATCH] drm/amdgpu fix incorrect sysfs remove behavior for xgmi

2020-05-18 Thread Jack Zhang
t Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index e9e59bc..3b46ea8 100644 --- a/drivers/gp

[PATCH] drm/amdgpu fix incorrect sysfs remove behavior for xgmi

2020-05-17 Thread Jack Zhang
t only needs to be removed once for a xgmi setup 3. remove sysfs_link hive->kobj with target name In amdgpu_xgmi_remove_device: 1. amdgpu_xgmi_sysfs_rem_dev_info needs to be run per device 2. amdgpu_xgmi_sysfs_destroy needs to be run on the last node of device. Signed-off-by: Jack Zhang --- dr

[PATCH 2/2] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset

2020-04-07 Thread Jack Zhang
[PATCH 2/2] kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions

[PATCH 1/2] drm/amdkfd Avoid destroy hqd when GPU is on reset

2020-04-07 Thread Jack Zhang
This reverts commit 8a468ab2d in order to split it into two different patches, and this will make it easier to understand. [PATCH 1/2] porting to gfx10 from commit 1b0bfcff463f390c4032ebe36a4d5fb777c00a4c Originally, MEC is touched without GPU initialized first. Signed-off-by: Jack Zhang

[PATCH 1/3] Revert "drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset"

2020-04-07 Thread Jack Zhang
This reverts commit 8a468ab2d75a6b0bacfb5da6a9036642436fc666. [Reason]: Revert this patch in order to split it into two different patches, and this will make it easier to understand. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3 --- drivers/gpu/drm/amd

[PATCH 3/3] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset

2020-04-07 Thread Jack Zhang
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH 1/3] Revert "drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset"

2020-04-07 Thread Jack Zhang
This reverts commit 8a468ab2d75a6b0bacfb5da6a9036642436fc666. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 -- 3 files changed, 8 deletions(-) diff --git

[PATCH 2/3] drm/amdkfd Avoid destroy hqd when GPU is on reset

2020-04-07 Thread Jack Zhang
porting to gfx10 from commit 1b0bfcff463f390c4032ebe36a4d5fb777c00a4c Originally, MEC is touched without GPU initialized first. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 3/3] drm/amdkfd Avoid destroy hqd when GPU is on reset

2020-04-07 Thread Jack Zhang
porting to gfx10 from commit 1b0bfcff463f390c4032ebe36a4d5fb777c00a4c Originally, MEC is touched without GPU initialized first. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 1/3] Revert "drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset"

2020-04-07 Thread Jack Zhang
This reverts commit 8a468ab2d75a6b0bacfb5da6a9036642436fc666. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 3 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 -- 3 files changed, 8 deletions(-) diff --git

[PATCH 2/3] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset

2020-04-07 Thread Jack Zhang
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset

2020-04-02 Thread Jack Zhang
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. v2:add a bugfix for kiq ring test fail Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3

[PATCH] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset

2020-04-02 Thread Jack Zhang
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH] drm/amdkfd: kfree the wrong pointer

2020-04-01 Thread Jack Zhang
Originally, it kfrees the wrong pointer for mem_obj. It would cause memory leak under stress test. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu

[PATCH] drm/amdgpu/sriov refine vcn_v2_5_early_init func

2020-03-10 Thread Jack Zhang
refine the assignment for vcn.num_vcn_inst, vcn.harvest_config, vcn.num_enc_rings in VF Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu/sriov set driver_table address in VF

2020-02-07 Thread Jack Zhang
. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c index 99ad4dd..b155f04 100644 --- a/drivers/gpu/drm/amd

[PATCH] drm/amdgpu/sriov Don't send msg when smu suspend

2020-02-05 Thread Jack Zhang
For sriov and pp_onevf_mode, do not send message to set smu status, becasue smu doesn't support these messages under VF. Besides, it should skip smu_suspend when pp_onevf_mode is disabled. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 --- drivers

[PATCH] drm/amdgpu/sriov skip the update of SMU_TABLE_ACTIVITY_MONITOR_COEFF

2020-01-15 Thread Jack Zhang
There's no need to dump ACTIVITY_MONITOR_COEFF under VF. Therefore, Skip the update of SMU_TABLE_ACTIVITY_MONITOR_COEFF under SRIOV VF. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/powerplay/arcturus_ppt.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers

[PATCH 1/2] amd/amdgpu/sriov enable onevf mode for ARCTURUS VF

2020-01-01 Thread Jack Zhang
and fix some indentation issue. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 3 +- drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 98 -- 3 files changed, 56 insertions(+), 48 deletions

[PATCH 2/2] amd/amdgpu/sriov tdr enablement with pp_onevf_mode

2020-01-01 Thread Jack Zhang
Under sriov and pp_onevf mode, 1.take resume instead of hw_init for smc recover to avoid potential memory leak. 2.add return condition inside smc resume function for sriov_pp_onevf_mode and pm_enabled param. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6

[PATCH 1/2] amd/amdgpu/sriov enable onevf mode for ARCTURUS VF

2020-01-01 Thread Jack Zhang
will do smu hw_init and skip some steps in normal smu hw_init flow because host driver has already done it for smu. With this fix, guest app can talk with smu and dump hw information from smu. v2: refine the logic for pm_enabled.Skip hw_init by not changing pm_enabled. Signed-off-by: Jack Zhang

[PATCH] amd/amdgpu/sriov swSMU disable for sriov

2019-12-02 Thread Jack Zhang
initialized in guest driver, swSMU cannot declare to be supported. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c b/drivers/gpu/drm/amd/powerplay

[PATCH 2/2] drm/amd/amdgpu/sriov skip RLCG s/r list for arcturus VF.

2019-11-20 Thread Jack Zhang
After rlcg fw 2.1, kmd driver starts to load extra fw for LIST_CNTL,GPM_MEM,SRM_MEM. We needs to skip the three fw because all rlcg related fw have been loaded by host driver. Guest driver would load the three fw fail without this change. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu

[PATCH 1/2] drm/amd/amdgpu/sriov temporarily skip ras, dtm, hdcp for arcturus VF

2019-11-20 Thread Jack Zhang
Temporarily skip ras,dtm,hdcp initialize and terminate for arcturus VF Currently the three features haven't been enabled at SRIOV, it would trigger guest driver load fail with the bare-metal path of the three features. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 36

[PATCH] drm/amd/amdgpu/sriov ip block setting of Arcturus

2019-09-29 Thread Jack Zhang
Add ip block setting for Arcturus SRIOV 1.PSP need to be initialized before IH. 2.SMU doesn't need to be initialized at kmd driver. 3.Arcturus doesn't support DCE hardware,it needs to skip register access to DCE. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10

[PATCH] drm/amdgpu/sriov: omit fbcon error under sriov or passthrough

2019-09-18 Thread Jack Zhang
In virtual machine, there would be a qxl or cirrus graphics device as the default master fbcon device. So for PF(passthrough mode) or SRIOV VF, it is reasonable to unload amdgpu driver. Amdgpu doesn't have to be the only fbcon device under this condition. Signed-off-by: Jack Zhang --- drivers

[PATCH] drm/amdgpu/sriov: add ring_stop before ring_create in psp v11 code

2019-09-10 Thread Jack Zhang
psp v11 code missed ring stop in ring create function(VMR) while psp v3.1 code had the code. This will cause VM destroy1 fail and psp ring create fail. For SIOV-VF, ring_stop should not be deleted in ring_create function. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c

[PATCH] drm/amd/amdgpu: add sw_fini interface for df_funcs

2019-09-03 Thread Jack Zhang
time. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/df_v1_7.c | 5 + drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 8 drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++ 4 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdgpu/sriov: fix Tonga load driver failed

2019-06-20 Thread Jack Zhang
Tonga sriov need to use smu to load firmware. Remove sriov flag because the default return value is zero. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd