Public bug reported: When running test cases for TheROCK CI[1] on gfx1152 on 6.14.0-1018-oem kernel, we observed that gfx1153 GPU may hang. When this happens, amdgpu driver shows the following message repeatedly:
``` [ 469.611126] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 469.611175] amdgpu: process pid 3717 DQM create queue type 1 failed. ret -12 [ 469.681611] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 469.681668] amdgpu: process pid 3716 DQM create queue type 1 failed. ret -12 [ 469.994214] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 469.994247] amdgpu: process pid 3744 DQM create queue type 1 failed. ret -12 [ 476.016596] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 476.016643] amdgpu: process pid 3794 DQM create queue type 1 failed. ret -12 [ 480.150401] amdgpu 0000:c4:00.0: amdgpu: No more SDMA queue to allocate [ 480.150435] amdgpu: process pid 3798 DQM create queue type 1 failed. ret -1 ``` This issue is reproducible even in upstream linux-firmware (HEAD at 57303edc), which should contain all the lates firmware for gfx1153. The ROCm sanity did pass, though, so likely this can only be triggered by certain workload. The root cause is currently under investigation. [1] https://github.com/ROCm/TheRock ** Affects: linux-firmware (Ubuntu) Importance: Undecided Assignee: Leo Lin (0xff07) Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2146784 Title: gfx1153 hangs on certain test cases in TheRock CI To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2146784/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
