Re: [PATCH 1/3] drm/amdgpu: re-work VM syncing

2024-08-22 Thread Friedrich Vock
On 21.08.24 22:46, Felix Kuehling wrote: On 2024-08-21 08:03, Christian König wrote: Rework how VM operations synchronize to submissions. Provide an amdgpu_sync container to the backends instead of an reservation object and fill in the amdgpu_sync object in the higher layers of the code. No in

Re: [PATCH 1/3] drm/amdgpu: re-work VM syncing

2024-08-21 Thread Friedrich Vock
upcomming changes. This looks like a really nice cleanup! I'm not sure how much it's worth given that I'm not a maintainer, but this one and patch 3 are Reviewed-by: Friedrich Vock Patch 2 looks ok to me as well, but I'm not in the loop as to what problem this fixes. If

[PATCH v3 0/3] drm/amdgpu: Explicit sync for GEM VA operations

2024-08-19 Thread Friedrich Vock
In Vulkan, it is the application's responsibility to perform adequate synchronization before a sparse unmap, replace or BO destroy operation. This adds an option to AMDGPU_VA_OPs to disable redundant implicit sync that happens on sparse unmap or replace operations. This has seen a significant impr

[PATCH v3 1/3] drm/amdgpu: Don't implicit sync PRT maps.

2024-08-19 Thread Friedrich Vock
From: Tatsuyuki Ishi These are considered map operations rather than unmap, and there is no point of doing implicit synchronization here. Signed-off-by: Tatsuyuki Ishi --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm

[PATCH v3 3/3] drm/amdgpu: Bump amdgpu driver version.

2024-08-19 Thread Friedrich Vock
From: Tatsuyuki Ishi For detection of the new explicit sync functionality without having to try the ioctl. Signed-off-by: Tatsuyuki Ishi --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/

[PATCH v3 2/3] drm/amdgpu: Add optional explicit sync fences for GEM operations.

2024-08-19 Thread Friedrich Vock
DRM handle even between different kind of drivers (radeonsi vs radv). Signed-off-by: Tatsuyuki Ishi Co-developed-by: Friedrich Vock Signed-off-by: Friedrich Vock --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 2 +- drivers/gpu/drm/

Re: [PATCH 1/3] drm/amdgpu: Forward soft recovery errors to userspace

2024-08-01 Thread Friedrich Vock
just push it into our internal kernel branch. Regards, Christian. 1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60...@froggi.es/ Signed-off-by: Joshua Ashton Cc: Friedrich Vock Cc: Bas Nieuwenhuizen Cc: Christian König Cc: André Almeida Cc: sta...@vger.kernel.org ---   dr

Re: [PATCH 00/34] GC per queue reset

2024-07-25 Thread Friedrich Vock
On 24.07.24 11:20, Zhu, Jiadong wrote: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Friday, July 19, 2024 9:40 PM To: Friedrich Vock Cc: Deucher, Alexander ; amd- g...@lists.freedesktop.org Subject: Re

Re: [PATCH 00/34] GC per queue reset

2024-07-18 Thread Friedrich Vock
Hi, On 18.07.24 16:06, Alex Deucher wrote: This adds preliminary support for GC per queue reset. In this case, only the jobs currently in the queue are lost. If this fails, we fall back to a full adapter reset. First of all, thank you so much for working on this! It's great to finally see pr

Re: [PATCH] drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell

2024-07-16 Thread Friedrich Vock
() for SDMA 5.2 and then allow it again in end_ring(). Enable this workaround while we are root causing the issue with the HW team. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440 Signed-off-by: Alex Deucher Looks like it works for me. Tested-by: Friedrich Vock Is there a particular

Re: [PATCH] drm/amdgpu: sysfs node disable query error count during gpu reset

2024-07-01 Thread Friedrich Vock
On 01.07.24 10:10, YiPeng Chai wrote: Sysfs node disable query error count during gpu reset. Can you elaborate a bit more? Usually the body shouldn't be a 1:1 copy of the summary phrase. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 -- drivers/gpu/drm/amd

[PATCH] drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit

2024-05-14 Thread Friedrich Vock
The special case for VM passthrough doesn't check adev->nbio.funcs before dereferencing it. If GPUs that don't have an NBIO block are passed through, this leads to a NULL pointer dereference on startup. Signed-off-by: Friedrich Vock Fixes: 1bece222eab ("drm/amdgpu: Clear

Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting

2024-05-13 Thread Friedrich Vock
On 09.05.24 11:19, Tvrtko Ursulin wrote: On 08/05/2024 20:08, Friedrich Vock wrote: On 08.05.24 20:09, Tvrtko Ursulin wrote: From: Tvrtko Ursulin The logic assumed any migration attempt worked and therefore would over- account the amount of data migrated during buffer re-validation. As a

Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription

2024-05-13 Thread Friedrich Vock
Hi, On 02.05.24 16:23, Maarten Lankhorst wrote: Hey, [snip] For Xe, I've been loking at using cgroups. A small prototype is available at https://cgit.freedesktop.org/~mlankhorst/linux/log/?h=dumpcg To stimulate discussion, I've added amdgpu support as well. This should make it possible to is

Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting

2024-05-08 Thread Friedrich Vock
Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c ind

Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM

2024-04-25 Thread Friedrich Vock
On 25.04.24 08:25, Christian König wrote: Am 24.04.24 um 18:57 schrieb Friedrich Vock: This adds GTT to the "preferred domains" of this buffer object, which will also prevent any attempts at moving the buffer back to VRAM if there is space. If VRAM is full, GTT will already be c

Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op

2024-04-25 Thread Friedrich Vock
On 25.04.24 09:15, Christian König wrote: Am 25.04.24 um 09:06 schrieb Friedrich Vock: On 25.04.24 08:58, Christian König wrote: Am 25.04.24 um 08:46 schrieb Friedrich Vock: On 25.04.24 08:32, Christian König wrote: Am 24.04.24 um 18:57 schrieb Friedrich Vock: Used by userspace to adjust

Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op

2024-04-25 Thread Friedrich Vock
On 25.04.24 08:58, Christian König wrote: Am 25.04.24 um 08:46 schrieb Friedrich Vock: On 25.04.24 08:32, Christian König wrote: Am 24.04.24 um 18:57 schrieb Friedrich Vock: Used by userspace to adjust buffer priorities in response to changes in application demand and memory pressure. Yeah

Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op

2024-04-24 Thread Friedrich Vock
On 25.04.24 08:32, Christian König wrote: Am 24.04.24 um 18:57 schrieb Friedrich Vock: Used by userspace to adjust buffer priorities in response to changes in application demand and memory pressure. Yeah, that was discussed over and over again. One big design criteria is that we can't

[RFC PATCH 11/18] drm/ttm: Bump BO priority count

2024-04-24 Thread Friedrich Vock
For adjustable priorities by userspace, it is nice to have a bit more granularity. Signed-off-by: Friedrich Vock --- include/drm/ttm/ttm_resource.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index

[RFC PATCH 17/18] drm/amdgpu: Implement EVICTED_VRAM query

2024-04-24 Thread Friedrich Vock
Used by userspace to gauge the severity of memory overcommit and make prioritization decisions based on it. Used by userspace to gauge the severity of memory overcommit and make prioritization decisions based on it. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3

[RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op

2024-04-24 Thread Friedrich Vock
Used by userspace to adjust buffer priorities in response to changes in application demand and memory pressure. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 include/uapi/drm/amdgpu_drm.h | 1 + 2 files changed, 21 insertions

[RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit

2024-04-24 Thread Friedrich Vock
TTM now takes care of moving buffers to the best possible domain. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 191 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h | 4 - drivers/gpu/drm/amd

[RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory

2024-04-24 Thread Friedrich Vock
These utilities will be used to keep track of what buffers have been evicted from any particular place, to try and decide when to try undoing the eviction. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_device.c | 1 + drivers/gpu/drm/ttm/ttm_resource.c | 14

[RFC PATCH 07/18] drm/amdgpu: Add TTM uneviction control functions

2024-04-24 Thread Friedrich Vock
Try unevicting only VRAM/GTT BOs. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 50 + 1 file changed, 50 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 64f5001a7dc5d

[RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM

2024-04-24 Thread Friedrich Vock
This adds GTT to the "preferred domains" of this buffer object, which will also prevent any attempts at moving the buffer back to VRAM if there is space. If VRAM is full, GTT will already be chosen as a fallback. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_ge

[RFC PATCH 14/18] drm/ttm: Consider BOs placed in non-favorite locations evicted

2024-04-24 Thread Friedrich Vock
If we didn't get the favorite placement because it was full, we should try moving it into the favorite placement once there is space. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --

[RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority

2024-04-24 Thread Friedrich Vock
Used to dynamically adjust priorities of buffers at runtime, to react to changes in memory pressure/usage patterns. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 17 + include/drm/ttm/ttm_bo.h | 2 ++ 2 files changed, 19 insertions(+) diff --git a

[RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority

2024-04-24 Thread Friedrich Vock
This makes buffer eviction significantly more stable by avoiding ping-ponging caused by low-priority buffers evicting high-priority buffers and vice versa. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 9 +++-- drivers/gpu/drm/ttm/ttm_resource.c | 5 +++-- include

[RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation

2024-04-24 Thread Friedrich Vock
When undoing evictions because of decreased memory pressure, it makes no sense to try evicting other buffers. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 2 ++ include/drm/ttm/ttm_bo.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b

[RFC PATCH 18/18] drm/amdgpu: Bump minor version

2024-04-24 Thread Friedrich Vock
Indicates support for EVICTED_VRAM queries and AMDGPU_GEM_OP_SET_PRIORITY Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu

[RFC PATCH 15/18] drm/amdgpu: Set a default priority for user/kernel BOs

2024-04-24 Thread Friedrich Vock
Reserve the highest priority for the kernel, and choose a balanced value as userspace default. Userspace is intended to be able to modify these later to mark buffers as important/unimportant. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 1 + drivers/gpu/drm/amd

[RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources

2024-04-24 Thread Friedrich Vock
We will never try evicting things from VRAM for these resources anyway. This affects TTM buffer uneviction logic, which would otherwise try to move these buffers into VRAM (clashing with VRAM-only allocations). Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13

[RFC PATCH 04/18] drm/ttm: Add driver funcs for uneviction control

2024-04-24 Thread Friedrich Vock
Provides fine-grained control for drivers over which buffers should be considered when attempting to undo evictions. Signed-off-by: Friedrich Vock --- include/drm/ttm/ttm_device.h | 23 +++ 1 file changed, 23 insertions(+) diff --git a/include/drm/ttm/ttm_device.h b/include

[RFC PATCH 00/18] TTM interface for managing VRAM oversubscription

2024-04-24 Thread Friedrich Vock
objections/comments/questions about my proposed design? Thanks, Friedrich [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6833 [2] https://gitlab.freedesktop.org/pixelcluster/mesa/-/tree/spilling Friedrich Vock (18): drm/ttm: Add tracking for evicted memory drm/ttm: Add per-BO

[RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking

2024-04-24 Thread Friedrich Vock
Make each buffer object aware of whether it has been evicted or not. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 1 + include/drm/ttm/ttm_bo.h | 11 +++ 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c

[RFC PATCH 03/18] drm/ttm: Implement BO eviction tracking

2024-04-24 Thread Friedrich Vock
For each buffer object, remember evictions and try undoing them if memory pressure gets lower again. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 28 +++- drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +++ 2 files changed, 30 insertions(+), 1 deletion

[RFC PATCH 06/18] drm/ttm: Add public buffer eviction/uneviction functions

2024-04-24 Thread Friedrich Vock
For now, they are only used internally inside TTM, but this will change with the introduction of dynamic buffer priorities. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/ttm/ttm_bo.c | 168 ++- include/drm/ttm/ttm_bo.h | 6 ++ 2 files changed, 172

Re: [PATCH v3 2/5] drm:amdgpu: Enable IH ring1 for IH v6.1

2024-04-16 Thread Friedrich Vock
On 16.04.24 15:34, Sunil Khatri wrote: We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/ih_v6_1.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --g

Re: [PATCH] drm/amdgpu: always force full reset for SOC21

2024-03-24 Thread Friedrich Vock
On 24.03.24 01:52, Alex Deucher wrote: There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now. Does "for now" mean that a proper fix is being worked on/will appear later? Immediately falling back to full resets is a really bad experience, and it's especi

Re: [PATCH] Documentation: add a page on amdgpu debugging

2024-03-13 Thread Friedrich Vock
On 13.03.24 22:01, Alex Deucher wrote: Covers GPU page fault debugging and adds a reference to umr. Signed-off-by: Alex Deucher --- Documentation/gpu/amdgpu/debugging.rst | 79 ++ Documentation/gpu/amdgpu/index.rst | 1 + 2 files changed, 80 insertions(+) creat

[PATCH] drm/amdgpu: Reset IH OVERFLOW_EN bit for IH 7.0

2024-03-09 Thread Friedrich Vock
IH 7.0 support landed shortly after the original patch for resetting the bit on all other generations, but without that patch applied. Cc: Christian König Cc: Alex Deucher Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/ih_v7_0.c | 6 ++ 1 file changed, 6 insertions(+) diff

[PATCH v3 2/2] drm/amdgpu: Process fences on IH overflow

2024-01-23 Thread Friedrich Vock
l.org Signed-off-by: Friedrich Vock --- v2: Set ih->overflow to false after processing fence v3: Move everything related to fence processing on overflow to patch 2 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 16 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 ++ drivers/gpu/drm/am

[PATCH v3 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit

2024-01-23 Thread Friedrich Vock
Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton Cc: Alex Deucher Cc: Christian König Cc: sta...@vger.kernel.org Signed-off-by: Friedrich Vock --- v2: Reset CLEAR_OVERFLOW bit immediately after setting it v3: Move everything related to fence processing on

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-23 Thread Friedrich Vock
On 23.01.24 10:36, Christian König wrote: Am 22.01.24 um 23:39 schrieb Joshua Ashton: [SNIP] Most work submissions in practice submit more waves than the number of wave slots the GPU has. As far as I understand soft recovery, the only thing it does is kill all active waves. This frees up the

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-22 Thread Friedrich Vock
On 22.01.24 11:21, Friedrich Vock wrote: On 22.01.24 11:10, Christian König wrote: Am 19.01.24 um 20:18 schrieb Felix Kuehling: On 2024-01-18 07:07, Christian König wrote: Am 18.01.24 um 00:44 schrieb Friedrich Vock: On 18.01.24 00:00, Alex Deucher wrote: [SNIP] Right now, IH overflows

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-22 Thread Friedrich Vock
On 22.01.24 11:10, Christian König wrote: Am 19.01.24 um 20:18 schrieb Felix Kuehling: On 2024-01-18 07:07, Christian König wrote: Am 18.01.24 um 00:44 schrieb Friedrich Vock: On 18.01.24 00:00, Alex Deucher wrote: [SNIP] Right now, IH overflows, even if they occur repeatedly, only get

[PATCH v2 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit

2024-01-18 Thread Friedrich Vock
Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton Cc: Alex Deucher Cc: Christian König Cc: sta...@vger.kernel.org Signed-off-by: Friedrich Vock --- v2: Reset CLEAR_OVERFLOW bit immediately after setting it drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2

[PATCH v2 2/2] drm/amdgpu: Process fences on IH overflow

2024-01-18 Thread Friedrich Vock
l.org Signed-off-by: Friedrich Vock --- v2: Set ih->overflow to false after processing fences drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index f3b0

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-17 Thread Friedrich Vock
On 18.01.24 00:00, Alex Deucher wrote: On Wed, Jan 17, 2024 at 7:36 AM Christian König wrote: Am 16.01.24 um 11:31 schrieb Friedrich Vock: On 16.01.24 08:03, Christian König wrote: Am 15.01.24 um 12:18 schrieb Friedrich Vock: [SNIP] +if (ih->overflow) { +tmp = RRE

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-16 Thread Friedrich Vock
On 16.01.24 08:03, Christian König wrote: Am 15.01.24 um 12:18 schrieb Friedrich Vock: Adding the original Ccs from the thread since they seemed to be missing in the reply. On 15.01.24 11:55, Christian König wrote: Am 14.01.24 um 14:00 schrieb Friedrich Vock: Allows us to detect subsequent

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
Re-sending as plaintext, sorry about that On 15.01.24 18:54, Michel Dänzer wrote: On 2024-01-15 18:26, Friedrich Vock wrote: [snip] The fundamental problem here is that not telling applications that something went wrong when you just canceled their work midway is an out-of-spec hack. When

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
On 15.01.24 18:54, Michel Dänzer wrote: On 2024-01-15 18:26, Friedrich Vock wrote: [snip] The fundamental problem here is that not telling applications that something went wrong when you just canceled their work midway is an out-of-spec hack. When there is a report of real-world apps breaking

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
On 15.01.24 18:09, Michel Dänzer wrote: On 2024-01-15 17:46, Joshua Ashton wrote: On 1/15/24 16:34, Michel Dänzer wrote: On 2024-01-15 17:19, Friedrich Vock wrote: On 15.01.24 16:43, Joshua Ashton wrote: On 1/15/24 15:25, Michel Dänzer wrote: On 2024-01-15 14:17, Christian König wrote: Am

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-15 Thread Friedrich Vock
On 15.01.24 16:43, Joshua Ashton wrote: On 1/15/24 15:25, Michel Dänzer wrote: On 2024-01-15 14:17, Christian König wrote: Am 15.01.24 um 12:37 schrieb Joshua Ashton: On 1/15/24 09:40, Christian König wrote: Am 13.01.24 um 15:02 schrieb Joshua Ashton: Without this feedback, the applicat

Re: [PATCH 2/2] drm/amdgpu: Process fences on IH overflow

2024-01-15 Thread Friedrich Vock
On 15.01.24 11:26, Christian König wrote: Am 14.01.24 um 14:00 schrieb Friedrich Vock: If the IH ring buffer overflows, it's possible that fence signal events were lost. Check each ring for progress to prevent job timeouts/GPU hangs due to the fences staying unsignaled despite the work

Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-15 Thread Friedrich Vock
Adding the original Ccs from the thread since they seemed to be missing in the reply. On 15.01.24 11:55, Christian König wrote: Am 14.01.24 um 14:00 schrieb Friedrich Vock: Allows us to detect subsequent IH ring buffer overflows as well. Well that suggested handling here is certainly broken

[PATCH 2/2] drm/amdgpu: Process fences on IH overflow

2024-01-14 Thread Friedrich Vock
ff-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index f3b0aaf3ebc6..2a246db1d3a7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c

[PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

2024-01-14 Thread Friedrich Vock
Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton Cc: Alex Deucher Cc: sta...@vger.kernel.org Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 ++ drivers/gpu/drm/amd/amdgpu/cik_ih.c | 13 + drivers/gpu/drm/amd

Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

2024-01-13 Thread Friedrich Vock
GPU in a variety of ways, and still have Steam + Counter-Strike 2 + Gamescope (compositor) stay up and continue functioning on Steam Deck. Signed-off-by: Joshua Ashton Tested-by: Friedrich Vock Cc: Friedrich Vock Cc: Bas Nieuwenhuizen Cc: Christian König Cc: André Almeida Cc: sta

Re: [PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

2023-12-08 Thread Friedrich Vock
he rest of what's needed as well. Thanks, Friedrich Regards, Christian. Am 08.12.23 um 09:19 schrieb Friedrich Vock: Friendly ping on this one. Userspace side got merged, so would be great to land this patch too :) On 02.12.23 01:17, Friedrich Vock wrote: This improves latency if the GPU

Re: [PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

2023-12-08 Thread Friedrich Vock
Friendly ping on this one. Userspace side got merged, so would be great to land this patch too :) On 02.12.23 01:17, Friedrich Vock wrote: This improves latency if the GPU is already busy with other work. This is useful for VR compositors that submit highly latency-sensitive compositing work on

[PATCH] drm/amdgpu: Enable tunneling on high-priority compute queues

2023-12-01 Thread Friedrich Vock
/mesa/-/merge_requests/26462 Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 10 ++ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++- 4 files changed, 11

Re: [PATCH] drm/amdgpu: Always emit GDS switch when GDS/GWS/OA is used

2023-08-02 Thread Friedrich Vock
Gentle ping. Any updates on this yet? Thanks, Friedrich On 20.07.23 23:25, Friedrich Vock wrote: Hi, On 07.07.23 10:21, Christian König wrote: Am 07.07.23 um 09:28 schrieb Friedrich Vock: Hi Christian, On 07.07.23 08:56, Christian König wrote: Am 07.07.23 um 08:28 schrieb Friedrich Vock

Re: [PATCH] drm/amdgpu: Always emit GDS switch when GDS/GWS/OA is used

2023-07-20 Thread Friedrich Vock
Hi, On 07.07.23 10:21, Christian König wrote: Am 07.07.23 um 09:28 schrieb Friedrich Vock: Hi Christian, On 07.07.23 08:56, Christian König wrote: Am 07.07.23 um 08:28 schrieb Friedrich Vock: During gfxoff, the per-VMID GDS registers are reset and not restored afterwards. Hui? Since

Re: [PATCH] drm/amdgpu: Always emit GDS switch when GDS/GWS/OA is used

2023-07-07 Thread Friedrich Vock
Hi Christian, On 07.07.23 08:56, Christian König wrote: Am 07.07.23 um 08:28 schrieb Friedrich Vock: During gfxoff, the per-VMID GDS registers are reset and not restored afterwards. Hui? Since when? Those registers should be part of the saved ones. Have you found that by observation

[PATCH] drm/amdgpu: Always emit GDS switch when GDS/GWS/OA is used

2023-07-06 Thread Friedrich Vock
submission. Fixes: 56b0989e29 ("drm/amdgpu: fix GDS/GWS/OA switch handling") Cc: sta...@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2530 Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 22 +++--- drivers/gpu/drm/

Re: [PATCH] drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptes

2023-02-06 Thread Friedrich Vock
Hi, thanks for applying the patch! Do you think it'd also be possible to backport it to previous kernel versions or do you already plan to do that? Since it is a one-liner bugfix it shouldn't be too hard to backport. Thank you, Friedrich Vock On 06.02.23 21:26, Alex Deucher wrote

[PATCH] drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptes

2023-02-02 Thread Friedrich Vock
(). For attributing events to specific threads, the thread id is also contained in the common fields of each trace event. Signed-off-by: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu