Am 2021-08-03 um 11:02 a.m. schrieb Eric Huang:
>
>
> On 2021-07-30 5:26 p.m., Felix Kuehling wrote:
>> Am 2021-07-28 um 1:31 p.m. schrieb Eric Huang:
>>> It is to fix a bug of gpu_recovery on multiple GPUs,
>>> When one gpu is reset, the application running on other
>>> gpu hangs, because kfd post
On 2021-07-30 5:26 p.m., Felix Kuehling wrote:
Am 2021-07-28 um 1:31 p.m. schrieb Eric Huang:
It is to fix a bug of gpu_recovery on multiple GPUs,
When one gpu is reset, the application running on other
gpu hangs, because kfd post reset doesn't restore the
running process.
This will resume a
Am 2021-07-28 um 1:31 p.m. schrieb Eric Huang:
> It is to fix a bug of gpu_recovery on multiple GPUs,
> When one gpu is reset, the application running on other
> gpu hangs, because kfd post reset doesn't restore the
> running process.
This will resume all processes, even those that were affected b
It is to fix a bug of gpu_recovery on multiple GPUs,
When one gpu is reset, the application running on other
gpu hangs, because kfd post reset doesn't restore the
running process. And it also fixes a bug in the function
kfd_process_evict_queues, when one gpu hangs, process
running on other gpus can