Public bug reported:

When running test cases for TheROCK CI[1] on gfx1152 on 6.14.0-1018-oem
kernel, we observed that gfx1153 GPU may hang. When this happens, amdgpu
driver shows the following message repeatedly:

```
[  469.611126] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  469.611175] amdgpu: process pid 3717 DQM create queue type 1 failed. ret -12
[  469.681611] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  469.681668] amdgpu: process pid 3716 DQM create queue type 1 failed. ret -12
[  469.994214] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  469.994247] amdgpu: process pid 3744 DQM create queue type 1 failed. ret -12
[  476.016596] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  476.016643] amdgpu: process pid 3794 DQM create queue type 1 failed. ret -12
[  480.150401] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  480.150435] amdgpu: process pid 3798 DQM create queue type 1 failed. ret -1
```

This issue is reproducible even in upstream linux-firmware (HEAD at
57303edc), which should contain all the lates firmware for gfx1153. The
ROCm sanity did pass, though, so likely this can only be triggered by
certain workload. The root cause is currently under investigation.

[1] https://github.com/ROCm/TheRock

** Affects: linux-firmware (Ubuntu)
     Importance: Undecided
     Assignee: Leo Lin (0xff07)
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146784

Title:
  gfx1153 hangs on certain test cases in TheRock CI

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2146784/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to