Public bug reported:

When running test cases for TheROCK CI[1] on gfx1152 on 6.14.0-1018-oem
kernel, we observed that gfx1153 GPU may hang. When this happens, amdgpu
driver shows the following message repeatedly:

```
[  469.611126] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  469.611175] amdgpu: process pid 3717 DQM create queue type 1 failed. ret -12
[  469.681611] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  469.681668] amdgpu: process pid 3716 DQM create queue type 1 failed. ret -12
[  469.994214] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  469.994247] amdgpu: process pid 3744 DQM create queue type 1 failed. ret -12
[  476.016596] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  476.016643] amdgpu: process pid 3794 DQM create queue type 1 failed. ret -12
[  480.150401] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate
[  480.150435] amdgpu: process pid 3798 DQM create queue type 1 failed. ret -1
```

This issue is reproducible even in upstream linux-firmware (HEAD at
57303edc), which should contain all the lates firmware for gfx1153. The
ROCm sanity did pass, though. The root cause is currently under
investigation.

[1] https://github.com/ROCm/TheRock

** Affects: linux-firmware (Ubuntu)
     Importance: Undecided
         Status: Invalid

** Changed in: linux-firmware (Ubuntu)
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146788

Title:
  gfx1153 hangs on certain test cases in TheRock CI

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2146788/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to