[AMD Official Use Only - AMD Internal Distribution Only]
Yes, the two places are 1. In debugfs and 2. In MI100's en/disable_debug_trap,
and evidently someone is testing the debugfs interface because there's a bug
fix for a race condition of it.
Teddy
Am 24.05.24 um 15:35 schrieb Li, Yunxiang (Teddy):
[AMD Official Use Only - AMD Internal Distribution Only]
If that is true you could in theory lower the locked area of the existing lock,
but adding a new one is strict no-go from my side.
I'll try this, right now I see two places where this
[AMD Official Use Only - AMD Internal Distribution Only]
> If that is true you could in theory lower the locked area of the existing
> lock, but adding a new one is strict no-go from my side.
I'll try this, right now I see two places where this would be problematic
because they are trying to
Am 23.05.24 um 17:35 schrieb Li, Yunxiang (Teddy):
[Public]
Here is taking a different lock than the reset_domain->sem. It is a seperate
reset_domain->gpu_sem that is only locked when we will actuall do reset, it is not
taken in the skip_hw_reset path.
Exactly that is what you should *not*
On Thu, May 23, 2024 at 11:32 AM Christian König
wrote:
>
> Am 23.05.24 um 13:36 schrieb Li, Yunxiang (Teddy):
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >>> +void amdgpu_lock_hw_access(struct amdgpu_device *adev); void
> >>> +amdgpu_unlock_hw_access(struct amdgpu_device
[Public]
> > Here is taking a different lock than the reset_domain->sem. It is a
> > seperate reset_domain->gpu_sem that is only locked when we will actuall do
> > reset, it is not taken in the skip_hw_reset path.
>
> Exactly that is what you should *not* do. Please don't add any new lock to
>
Am 23.05.24 um 13:36 schrieb Li, Yunxiang (Teddy):
[AMD Official Use Only - AMD Internal Distribution Only]
+void amdgpu_lock_hw_access(struct amdgpu_device *adev); void
+amdgpu_unlock_hw_access(struct amdgpu_device *adev); int
+amdgpu_begin_hw_access(struct amdgpu_device *adev); void
[AMD Official Use Only - AMD Internal Distribution Only]
> > +void amdgpu_lock_hw_access(struct amdgpu_device *adev); void
> > +amdgpu_unlock_hw_access(struct amdgpu_device *adev); int
> > +amdgpu_begin_hw_access(struct amdgpu_device *adev); void
> > +amdgpu_end_hw_access(struct amdgpu_device
Am 22.05.24 um 19:27 schrieb Yunxiang Li:
Random accesses to the GPU while it is not re-initialized can lead to a
bad time. So add a rwsem to prevent such accesses. Normal accesses will
now take the read lock for shared GPU access, reset takes the write lock
for exclusive GPU access.
Care need
Random accesses to the GPU while it is not re-initialized can lead to a
bad time. So add a rwsem to prevent such accesses. Normal accesses will
now take the read lock for shared GPU access, reset takes the write lock
for exclusive GPU access.
Care need to be taken so that the recovery thread does
10 matches
Mail list logo