Am 15.09.23 um 21:34 schrieb Philip Yang:
On GFX v9.4.3, application have random timeout failure when XNACK on,
with dmesg log "amdgpu: IH soft ring buffer overflow 0x900, 0x900",
means retry CAM has more than 256 entries. After increasing IH soft ring
to 512 entries, the test passed repeatly,
On GFX v9.4.3, application have random timeout failure when XNACK on,
with dmesg log "amdgpu: IH soft ring buffer overflow 0x900, 0x900",
means retry CAM has more than 256 entries. After increasing IH soft ring
to 512 entries, the test passed repeatly, no IH soft ring overflow
message.
Am 2023-07-07 um 10:14 schrieb Philip Yang:
Retry faults are delegated to IH soft ring and then processed by
deferred worker. Current IH soft ring size PAGE_SIZE can store 128
entries, which may overflow and drop retry faults, causes HW stucks
because the retry fault is not recovered.
[AMD Official Use Only - General]
> -Original Message-
> From: Yang, Philip
> Sent: Friday, July 7, 2023 10:15 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Kuehling, Felix ; Joshi, Mukul
> ; Yang, Philip
> Subject: [PATCH] drm/amdgpu: Increase IH soft ring
Retry faults are delegated to IH soft ring and then processed by
deferred worker. Current IH soft ring size PAGE_SIZE can store 128
entries, which may overflow and drop retry faults, causes HW stucks
because the retry fault is not recovered.
Increase IH soft ring size to the same size as IH ring,