[RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-01-25 Thread Andrey Grodzovsky
This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same o

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-01-28 Thread Andrey Grodzovsky
Just a gentle ping if people have more comments on this patch set ? Especially last 5 patches as first 7 are exact same as V2 and we already went over them mostly. Andrey On 2022-01-25 17:37, Andrey Grodzovsky wrote: This patchset is based on earlier work by Boris[1] that allowed to have an or

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-02 Thread Andrey Grodzovsky
Just another ping, with Shyun's help I was able to do some smoke testing on XGMI SRIOV system (booting and triggering hive reset) and for now looks good. Andrey On 2022-01-28 14:36, Andrey Grodzovsky wrote: Just a gentle ping if people have more comments on this patch set ? Especially last 5 p

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-08 Thread JingWen Chen
Hi Andrey, I have been testing your patch and it seems fine till now. Best Regards, Jingwen Chen On 2022/2/3 上午2:57, Andrey Grodzovsky wrote: > Just another ping, with Shyun's help I was able to do some smoke testing on > XGMI SRIOV system (booting and triggering hive reset) > and for now look

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-09 Thread Andrey Grodzovsky
Thanks a lot! Andrey On 2022-02-09 01:06, JingWen Chen wrote: Hi Andrey, I have been testing your patch and it seems fine till now. Best Regards, Jingwen Chen On 2022/2/3 上午2:57, Andrey Grodzovsky wrote: Just another ping, with Shyun's help I was able to do some smoke testing on XGMI SRIO