RE: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-06-05 Thread Liu, Shaoyun
oenig, Christian Sent: Tuesday, June 4, 2024 4:07 AM To: Liu, Shaoyun ; Christian König ; Li, Yunxiang (Teddy) ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Xiao, Hua Subject: Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started Hi Shaoyun, see inline. Am 03.06.24 um

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-06-04 Thread Christian König
nt: Monday, June 3, 2024 6:59 AM To: Liu, Shaoyun ; Christian König ; Li, Yunxiang (Teddy) ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Xiao, Hua Subject: Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started Hi Shaoyun, yes my thinking goes into the same direction. The

RE: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-06-03 Thread Liu, Shaoyun
ddy) ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Xiao, Hua Subject: Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started Hi Shaoyun, yes my thinking goes into the same direction. The basic problem here is that we are trying to stuff two different information into the same

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-06-03 Thread Christian König
ay, May 29, 2024 11:19 AM To: Li, Yunxiang (Teddy) ; Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started Am 29.05.24 um 16:48 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] Ye

RE: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-31 Thread Liu, Shaoyun
d-gfx@lists.freedesktop.org Subject: Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started Am 29.05.24 um 16:48 schrieb Li, Yunxiang (Teddy): > [AMD Official Use Only - AMD Internal Distribution Only] > >> Yeah, I know. That's one of the reason I've pointed out

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 16:48 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] Yeah, I know. That's one of the reason I've pointed out on the patch adding that that this behavior is actually completely broken. If you run into issues with the MES because of this the

RE: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Li, Yunxiang (Teddy)
[AMD Official Use Only - AMD Internal Distribution Only] > Yeah, I know. That's one of the reason I've pointed out on the patch adding > that that this behavior is actually completely broken. > > If you run into issues with the MES because of this then please suggest a > revert of that patch. I t

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 16:31 schrieb Li, Yunxiang (Teddy): [Public] The problem is that we don't force complete the non scheduler rings, e.g. MES, KIQ etc... Try to remove this check here from the loop in amdgpu_device_pre_asic_reset(): if (!amdgpu_ring_sched_ready(ring))

RE: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Li, Yunxiang (Teddy)
[Public] > The problem is that we don't force complete the non scheduler rings, e.g. MES, > KIQ etc... > > Try to remove this check here from the loop in > amdgpu_device_pre_asic_reset(): > > if (!amdgpu_ring_sched_ready(ring)) > continue; Ah, I see. Thou

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 15:44 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] I don't think trying to add some reset handling here makes sense in the first place. Part of the reset/recovery procedure is to signal all fence and that includes the one we are waiting

RE: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Li, Yunxiang (Teddy)
[AMD Official Use Only - AMD Internal Distribution Only] > I don't think trying to add some reset handling here makes sense in the first > place. > Part of the reset/recovery procedure is to signal all fence and that includes > the one we are waiting for here. > So this wait should return immedi

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 15:22 schrieb Li, Yunxiang (Teddy): [Public] It's perfectly possible that the reset has already started before we enter the function. Yeah, this could and does happen, but it just means we are back to the old behavior. I guess I could use "can I take the read side lock?" to te

RE: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Li, Yunxiang (Teddy)
[Public] > It's perfectly possible that the reset has already started before we enter > the function. Yeah, this could and does happen, but it just means we are back to the old behavior. I guess I could use "can I take the read side lock?" to test if the function is called outside of reset or

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-28 Thread Christian König
Am 28.05.24 um 19:23 schrieb Yunxiang Li: If a reset is triggered, there's no point in waiting for the fence back anymore, it just makes the reset code wait for a long time for the reset_domain read lock to be dropped. This also makes our reply to host FLR fast enough so the host doesn't timeout