RE: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-24 Thread Li, Yunxiang (Teddy)
[AMD Official Use Only - AMD Internal Distribution Only] Yes, the two places are 1. In debugfs and 2. In MI100's en/disable_debug_trap, and evidently someone is testing the debugfs interface because there's a bug fix for a race condition of it. Teddy

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-24 Thread Christian König
Am 24.05.24 um 15:35 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] If that is true you could in theory lower the locked area of the existing lock, but adding a new one is strict no-go from my side. I'll try this, right now I see two places where this

RE: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-24 Thread Li, Yunxiang (Teddy)
[AMD Official Use Only - AMD Internal Distribution Only] > If that is true you could in theory lower the locked area of the existing > lock, but adding a new one is strict no-go from my side. I'll try this, right now I see two places where this would be problematic because they are trying to

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-24 Thread Christian König
Am 23.05.24 um 17:35 schrieb Li, Yunxiang (Teddy): [Public] Here is taking a different lock than the reset_domain->sem. It is a seperate reset_domain->gpu_sem that is only locked when we will actuall do reset, it is not taken in the skip_hw_reset path. Exactly that is what you should *not*

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-23 Thread Alex Deucher
On Thu, May 23, 2024 at 11:32 AM Christian König wrote: > > Am 23.05.24 um 13:36 schrieb Li, Yunxiang (Teddy): > > [AMD Official Use Only - AMD Internal Distribution Only] > > > >>> +void amdgpu_lock_hw_access(struct amdgpu_device *adev); void > >>> +amdgpu_unlock_hw_access(struct amdgpu_device

RE: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-23 Thread Li, Yunxiang (Teddy)
[Public] > > Here is taking a different lock than the reset_domain->sem. It is a > > seperate reset_domain->gpu_sem that is only locked when we will actuall do > > reset, it is not taken in the skip_hw_reset path. > > Exactly that is what you should *not* do. Please don't add any new lock to >

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-23 Thread Christian König
Am 23.05.24 um 13:36 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] +void amdgpu_lock_hw_access(struct amdgpu_device *adev); void +amdgpu_unlock_hw_access(struct amdgpu_device *adev); int +amdgpu_begin_hw_access(struct amdgpu_device *adev); void

RE: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-23 Thread Li, Yunxiang (Teddy)
[AMD Official Use Only - AMD Internal Distribution Only] > > +void amdgpu_lock_hw_access(struct amdgpu_device *adev); void > > +amdgpu_unlock_hw_access(struct amdgpu_device *adev); int > > +amdgpu_begin_hw_access(struct amdgpu_device *adev); void > > +amdgpu_end_hw_access(struct amdgpu_device

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-23 Thread Christian König
Am 22.05.24 um 19:27 schrieb Yunxiang Li: Random accesses to the GPU while it is not re-initialized can lead to a bad time. So add a rwsem to prevent such accesses. Normal accesses will now take the read lock for shared GPU access, reset takes the write lock for exclusive GPU access. Care need

[PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-22 Thread Yunxiang Li
Random accesses to the GPU while it is not re-initialized can lead to a bad time. So add a rwsem to prevent such accesses. Normal accesses will now take the read lock for shared GPU access, reset takes the write lock for exclusive GPU access. Care need to be taken so that the recovery thread does