Re: [PATCH 1/5] drm/amd/amdgpu revert "implement tdr advanced mode"

2023-01-31 Thread Christian König
A later bad compute job can block a good gfx job The gang submit/barrier approach makes sure that only one application at a time can use the gfx/compute block. So when application B makes a compute submission while a GFX submission of application A is still running we will wait for that GFX

Re: [PATCH 1/5] drm/amd/amdgpu revert "implement tdr advanced mode"

2023-01-30 Thread Luben Tuikov
The series is, Acked-by: Luben Tuikov We don't want the kernel to be in the business of retrying client's requests. Instead we want the kernel to provide a conduit for such requests to be sent, executed by the GPU, and a result returned. If the kernel cannot process requests for any reason, e.g.

Re: [PATCH 1/5] drm/amd/amdgpu revert "implement tdr advanced mode"

2023-01-28 Thread Yin, ZhenGuo (Chris)
Hi, Christian A later bad compute job can block a good gfx job, so the original TDR design find a wrong guilty job(good gfx job). Advanced TDR re-submits jobs in order to find the real guilty job(bad compute job). After reverting this commit, how does the new gang-submit promise the isolat

[PATCH 1/5] drm/amd/amdgpu revert "implement tdr advanced mode"

2022-10-26 Thread Christian König
This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75. This feature basically re-submits one job after another to figure out which one was the one causing a hang. This is obviously incompatible with gang-submit which requires that multiple jobs run at the same time. It's also absolutely no