Public bug reported: When running test cases for TheROCK CI[1] on gfx1152 on 6.14.0-1018-oem kernel, we observed that gfx1153 GPU may hang. When this happens, amdgpu driver shows the following message repeatedly:
``` [ 469.611126] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 469.611175] amdgpu: process pid 3717 DQM create queue type 1 failed. ret -12 [ 469.681611] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 469.681668] amdgpu: process pid 3716 DQM create queue type 1 failed. ret -12 [ 469.994214] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 469.994247] amdgpu: process pid 3744 DQM create queue type 1 failed. ret -12 [ 476.016596] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 476.016643] amdgpu: process pid 3794 DQM create queue type 1 failed. ret -12 [ 480.150401] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 480.150435] amdgpu: process pid 3798 DQM create queue type 1 failed. ret -1 ``` This issue is reproducible even in upstream linux-firmware (HEAD at 57303edc), which should contain all the lates firmware for gfx1153. The ROCm sanity did pass, though. The root cause is currently under investigation. [1] https://github.com/ROCm/TheRock ** Affects: linux-firmware (Ubuntu) Importance: Undecided Status: Invalid ** Changed in: linux-firmware (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2146788 Title: gfx1153 hangs on certain test cases in TheRock CI To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2146788/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
