[PATCH 2/2] drm/amdkfd: support the debugger during per-queue reset

2024-07-26 Thread Jonathan Kim
space will no longer be able to access reset queues. v2: move per-queue reset flag to this patch rebase based on patch 1 changes Signed-off-by: Jonathan Kim --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 31 --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + include/uapi

[PATCH 1/2] drm/amdkfd: support per-queue reset on gfx9

2024-07-26 Thread Jonathan Kim
call safe during power saving modes. clean up some other nitpicks. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 4 +- .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 4 +- .../drm/amd/amdgpu

[PATCH] drm/amdkfd: allow users to target recommended SDMA engines

2024-07-24 Thread Jonathan Kim
and refactor sdma resource bit setting logic. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 ++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 38 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +- .../amd/amdkfd

[PATCH] drm/amdkfd: fix debug watchpoints for logical devices

2024-07-22 Thread Jonathan Kim
The number of watchpoints should be set and constrained per logical partition device, not by the socket device. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 20 ++-- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++-- drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdkfd: allow users to target recommended SDMA engines

2024-07-18 Thread Jonathan Kim
Certain GPUs have better copy performance over xGMI on specific SDMA engines depending on the source and destination GPU. Allow users to create SDMA queues on these recommended engines. Close to 2x overall performance has been observed with this optimization. Signed-off-by: Jonathan Kim

[PATCH 1/2] drm/amdkfd: support per-queue reset on gfx9

2024-07-18 Thread Jonathan Kim
by SET_RESOURCES first to identify the user queue candidates to reset. Only signal reset events to processes that have had a queue reset. If queue reset fails, fall back to GPU reset. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 1 + .../drm/amd/amdgpu

[PATCH 2/2] drm/amdkfd: support the debugger during per-queue reset

2024-07-18 Thread Jonathan Kim
space will no longer be able to access reset queues. Signed-off-by: Jonathan Kim --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 30 +++ include/uapi/linux/kfd_ioctl.h| 4 +++ 2 files changed, 29 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH] drm/amdgpu: increase mes submission timeout

2024-04-11 Thread Jonathan Kim
MES internally has a timeout allowance of 2 seconds. Increase driver timeout to 3 seconds to be safe. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu

[PATCH] drm/amdkfd: range check cp bad op exception interrupts

2024-03-13 Thread Jonathan Kim
Due to a CP interrupt bug, bad packet garbage exception codes are raised. Do a range check so that the debugger and runtime do not receive garbage codes. Update the user api to guard exception code type checking as well. Signed-off-by: Jonathan Kim Tested-by: Jesse Zhang --- .../gpu/drm/amd

[PATCH] drm/amdkfd: fix process reference drop on debug ioctl

2024-02-21 Thread Jonathan Kim
Prevent dropping the KFD process reference at the end of a debug IOCTL call where the acquired process value is an error. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b

[PATCH] drm/amdkfd: only flush mes process context if mes support is there

2023-12-13 Thread Jonathan Kim
Fix up on mes process context flush to prevent non-mes devices from spamming error messages or running into undefined behaviour during process termination. Fixes: 73204d028eb5 ("drm/amdkfd: fix mes set shader debugger process management") Signed-off-by: Jonathan Kim --- drivers/g

[PATCH] drm/amdkfd: fix mes set shader debugger process management

2023-12-11 Thread Jonathan Kim
that the flush call and the MES debugger calls use the same MES interface but are separated as KFD calls to avoid conflicting with each other. Signed-off-by: Jonathan Kim Tested-by: Alice Wong --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 31 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

[PATCH] drm/amdgpu: update xgmi num links info post gc9.4.2

2023-11-17 Thread Jonathan Kim
GC IP 9.4.2 and up support TA reporting of the number of xGMI links between peers. Tested-by: Vignesh Chander Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: fix add queue process context clear without runtime enable

2023-09-12 Thread Jonathan Kim
handling and running KFD tests. The only time ADD_QUEUE.skip_process_ctx_clear is required is for debugger use cases where a debugged process is always runtime enabled when adding a queue. Tested-by: Shikai Guo Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6

[PATCH] drm/amdkfd: fix add queue process context clear for hsa non-init cases

2023-09-12 Thread Jonathan Kim
adding a queue. Tested-by: Shikai Guo Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdkfd: fix double assign skip process context clear

2023-08-10 Thread Jonathan Kim
Remove redundant assignment when skipping process ctx clear. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd

[PATCH] drm/amdkfd: fix and enable ttmp setup for gfx11

2023-07-24 Thread Jonathan Kim
The MES cached process context must be cleared on adding any queue for the first time. For proper debug support, the MES will clear it's cached process context on the first call to SET_SHADER_DEBUGGER. This allows TTMPs to be pesistently enabled in a safe manner. Signed-off-by: Jonathan Kim

[PATCH] drm/amdkfd: enable cooperative groups for gfx11

2023-07-19 Thread Jonathan Kim
do not want these to be cooperative dispatches. v2: fix up indentation and comments. remove unnecessary perf warning on oversubscription. change 0 init to 0 memset to deal with padding. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 ++ drivers/gpu/drm

[PATCH 1/2] drm/amdkfd: fix trap handling work around for debugging

2023-07-14 Thread Jonathan Kim
Update the list of devices that require the cwsr trap handling workaround for debugging use cases. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 5 ++--- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 6 ++ drivers/gpu/drm/amd/amdkfd

[PATCH 2/2] drm/amdkfd: enable cooperative groups for gfx11

2023-07-14 Thread Jonathan Kim
do not want these to be cooperative dispatches. NOTE: FIXME MES FW enablement checks are a placeholder at the moment and will be updated when the binary revision number is finalized. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 +- drivers/gpu/drm

[PATCH] drm/amdkfd: report dispatch id always saved in ttmps after gc9.4.2

2023-07-11 Thread Jonathan Kim
-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 1a4cdee86759..eeedc3ddffeb 100644 --- a/drivers/gpu/drm/amd/am

[PATCH] drm/amdkfd: decrement queue count on mes queue destroy

2023-06-13 Thread Jonathan Kim
Queue count should decrement on queue destruction regardless of HWS support type. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b

[PATCH] drm/amdkfd: fix null queue check on debug setting exceptions

2023-06-12 Thread Jonathan Kim
Null check should be done on queue struct itself and not on the process queue list node. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd

[PATCH] drm/amdkfd: fix and enable debugging for gfx11

2023-06-07 Thread Jonathan Kim
. Once the binaries have been created, this check may be subject to change. v2: do a trap_en safety check in case old mes doesn't accept unused trap_en d-word. remove unnecessary process termination work around. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c| 7

[PATCH] drm/amdkfd: optimize gfx off enable toggle for debugging

2023-06-07 Thread Jonathan Kim
access issues. Remove KFD GFX OFF enable toggle clutter by moving these calls into the KGD debug calls themselves. v2: toggle gfx off around address watch hi/lo settings as well. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 4 +++ .../drm/amd/amdgpu

[PATCH] drm/amdkfd: fix and enable debugging for gfx11

2023-06-07 Thread Jonathan Kim
to change. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 4 ++- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 1 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 31 ++- .../drm/amd/amdkfd

[PATCH] drm/amdkfd: fix vmfault signalling with additional data.

2023-06-07 Thread Jonathan Kim
Exception handling for vmfaults should be raised with additional data. Reported-by: Mukul Joshi Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 34 +++-- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH 31/33] drm/amdkfd: add debug queue snapshot operation

2023-05-25 Thread Jonathan Kim
. Also allow the debugger to clear exceptions when doing a snapshot. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 + .../drm/amd/amdkfd/kfd_device_queue_manager.h

[PATCH 30/33] drm/amdkfd: add debug query exception info operation

2023-05-25 Thread Jonathan Kim
of clearing the target exception on query. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 6 ++ 3 files changed, 133

[PATCH 32/33] drm/amdkfd: add debug device snapshot operation

2023-05-25 Thread Jonathan Kim
a subsequent successful call. v2: add num_xcc to device snapshot and fixup new kfd_node reference Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 73 drivers/gpu/drm/amd/amdkfd/kfd_debug.h

[PATCH 29/33] drm/amdkfd: add debug query event operation

2023-05-25 Thread Jonathan Kim
Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm

[PATCH 22/33] drm/amdkfd: update process interrupt handling for debug events

2023-05-25 Thread Jonathan Kim
. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. v2: use new kfd_node struct in prototypes. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

[PATCH 23/33] drm/amdkfd: add debug set exceptions enabled operation

2023-05-25 Thread Jonathan Kim
The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 36

[PATCH 33/33] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2023-05-25 Thread Jonathan Kim
Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 - include/uapi/linux/kfd_ioctl.h | 3 ++- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git

[PATCH 25/33] drm/amdkfd: add debug wave launch mode operation

2023-05-25 Thread Jonathan Kim
Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. v2: fixup with new kfd_node struct reference for mes check Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu

[PATCH 27/33] drm/amdkfd: add debug set and clear address watch points operation

2023-05-25 Thread Jonathan Kim
watch points are allocated or not. v2: fixup with new kfd_node struct reference for mes and watch point checks Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 51 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu

[PATCH 17/33] drm/amdkfd: apply trap workaround for gfx11

2023-05-25 Thread Jonathan Kim
engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm

[PATCH 21/33] drm/amdkfd: add debug trap enabled flag to tma

2023-05-25 Thread Jonathan Kim
From: Jay Cornwall Trap handler behavior will differ when a debugger is attached. Make the debug trap flag available in the trap handler TMA. Update it when the debug trap ioctl is invoked. Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim Reviewed

[PATCH 15/33] drm/amdgpu: expose debug api for mes

2023-05-25 Thread Jonathan Kim
. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 32 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 20 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++ drivers/gpu/drm/amd/include

[PATCH 28/33] drm/amdkfd: add debug set flags operation

2023-05-25 Thread Jonathan Kim
cise at the cost of performance. This setting is not permitted on debug devices that support only a global setting of this option. Return the previous set flags to the debugger as well. v2: fixup with new kfd_node struct reference mes checks Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/am

[PATCH 11/33] drm/amdgpu: add gfx11 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
Implement the per-device calls to enable or disable HW debug mode for GFX11. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 38 +++ 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 24/33] drm/amdkfd: add debug wave launch override operation

2023-05-25 Thread Jonathan Kim
be overridden or fully replaced. In order for the debugger to know what is permissible, returned the supported override mask back to the debugger along with the previously enable overrides. v2: fixup with new kfd_node struct reference for mes check Signed-off-by: Jonathan Kim --- .../drm/amd

[PATCH 26/33] drm/amdkfd: add debug suspend and resume process queues operation

2023-05-25 Thread Jonathan Kim
suspend or resume queues). v2: fixup new kfd_node struct reference for mes fw check. also fixup missing EC_QUEUE_NEW flagging on newly created queue. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + drivers

[PATCH 19/33] drm/amdkfd: add send exception operation

2023-05-25 Thread Jonathan Kim
. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

[PATCH 18/33] drm/amdkfd: add raise exception event function

2023-05-25 Thread Jonathan Kim
. For memory violation exceptions, extra exception data will be saved. The debugger will be able to query the saved exception states by query operation that will be provided by follow up patches. v2: use new kfd_node struct in prototype. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd

[PATCH 10/33] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
the required register values that the HWS needs to write on debug enable and disable. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 42 ++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm

[PATCH 16/33] drm/amdkfd: add per process hw trap enable and disable functions

2023-05-25 Thread Jonathan Kim
functions are implemented in a follow up patch. v2: spot fix with new kfd_node references Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 148 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 29

[PATCH 20/33] drm/amdkfd: add runtime enable operation

2023-05-25 Thread Jonathan Kim
Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 143 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 4 files changed, 150 insertions(+), 4

[PATCH 13/33] drm/amdkfd: prepare map process for single process debug devices

2023-05-25 Thread Jonathan Kim
SET_RESOUCES so that a debugged process will never migrate away from its pinned VMID. The KFD is responsible for reserving and releasing this pinned VMID accordingly whenever the debugger attaches and detaches respectively. v2: spot fix ups using new kfd_node references Signed-off-by: Jonathan Kim

[PATCH 08/33] drm/amdkfd: fix kfd_suspend_all_processes

2023-05-25 Thread Jonathan Kim
Flush delayed restore work in kfd_suspend_all_queues instead of cancelling. Cancelling the work before it runs results in the queues becoming permanently disabled. Flushing the work ensures that the queue suspend/resume state stays balanced. Signed-off-by: Jonathan Kim Reviewed-by: Felix

[PATCH 07/33] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
changing the implicit wait count setting. Once set, resume all work. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 116 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

[PATCH 12/33] drm/amdgpu: add configurable grace period for unmap queues

2023-05-25 Thread Jonathan Kim
. v2: add null grace period function pointers to VI packet manager. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43 .../drm/amd/amdgpu

[PATCH 14/33] drm/amdgpu: prepare map process for multi-process debug devices

2023-05-25 Thread Jonathan Kim
. v2: spot fixup new kfd_node references Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 ++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 51 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 ++ .../drm/amd/amdkfd/kfd_packet_manager_v9.c

[PATCH 09/33] drm/amdgpu: add gfx10 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
will be fixed for GFX11 onwards. Also remove a bunch of deprecated misplaced references for GFX10.3. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 96 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 28 .../drm/amd

[PATCH 05/33] drm/amdgpu: setup hw debug registers on driver initialization

2023-05-25 Thread Jonathan Kim
rder to correctly set this up, set the special reserved CP bit by default whenever the MQD is initailized. v2: add missing 0-init of SPI_GDBG_TRAP_DATA0/1 Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

[PATCH 06/33] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2023-05-25 Thread Jonathan Kim
inheritence of that mode is upheld. Also ensure that exception overrides are reset to their original state prior to debug enable or disable. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 92 +++ .../gpu/drm/amd/amdgpu

[PATCH 04/33] drm/amdgpu: add kgd hw debug mode setting interface

2023-05-25 Thread Jonathan Kim
Introduce the require KGD debug calls that will execute hardware debug mode setting. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/include/kgd_kfd_interface.h | 34 +++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/include

[PATCH 03/33] drm/amdkfd: prepare per-process debug enable and disable

2023-05-25 Thread Jonathan Kim
events will notify the debugger through a pollable FIFO file descriptor that the debugger provides to the KFD to manage. Finally on process termination of either the debugger or the target, debugging must be disabled if it has not been done so. Signed-off-by: Jonathan Kim Reviewed-by: Felix

[PATCH 01/33] drm/amdkfd: add debug and runtime enable interface

2023-05-25 Thread Jonathan Kim
coordinates exception handling with the HSA runtime. Usage is available in the kern docs at uapi/linux/kfd_ioctl.h. v2: add num_xcc to device snapshot entry. fixup missing EC_QUEUE_PACKET_RESERVED mask. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 48 ++ include

[PATCH 02/33] drm/amdkfd: display debug capabilities

2023-05-25 Thread Jonathan Kim
-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 6 ++ include/uapi/linux/kfd_sysfs.h| 15 3 files changed, 117 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH 34/34] drm/amdkfd: optimize gfx off enable toggle for debugging

2023-03-27 Thread Jonathan Kim
access issues. Remove KFD GFX OFF enable toggle clutter by moving these calls into the KGD debug calls themselves. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 7 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 33 ++- .../gpu/drm/amd/amdgpu

[PATCH 33/34] drm/amdkfd: bump kfd ioctl minor version for debug api availability

2023-03-27 Thread Jonathan Kim
Bump the minor version to declare debugging capability is now available. v2: bump to 1.13 after upstream rebase. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 - include/uapi/linux/kfd_ioctl.h | 3 ++- 2 files changed, 2

[PATCH 27/34] drm/amdkfd: add debug set and clear address watch points operation

2023-03-27 Thread Jonathan Kim
-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 51 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 78 ++ .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 8 ++ .../drm

[PATCH 21/34] drm/amdkfd: add debug trap enabled flag to tma

2023-03-27 Thread Jonathan Kim
flag setup on APUs Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 11 +++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 +++ 3 files changed

[PATCH 25/34] drm/amdkfd: add debug wave launch mode operation

2023-03-27 Thread Jonathan Kim
and remove deprecated launch mode options Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 12 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 25 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h

[PATCH 26/34] drm/amdkfd: add debug suspend and resume process queues operation

2023-03-27 Thread Jonathan Kim
suspend or resume queues). v3: update safer copy context save header v2: add gfx11/mes support. prevent header copy on suspend from overwriting user fields. simplify resume_queues function. address other nit-picks Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5

[PATCH 28/34] drm/amdkfd: add debug set flags operation

2023-03-27 Thread Jonathan Kim
flag for now. v2: add gfx11 support. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 58 drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 1 + 3 files changed, 61 insertions(+) diff --git a/drivers/gpu

[PATCH 20/34] drm/amdkfd: add runtime enable operation

2023-03-27 Thread Jonathan Kim
for runtime_enable. v2: fix up hierarchy of semantics in description. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 143 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 4 + drivers/gpu/drm/amd/amdkfd

[PATCH 19/34] drm/amdkfd: add send exception operation

2023-03-27 Thread Jonathan Kim
. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. v2: missing closing brace in set workaround function got fixed in patch 17. Signed-off-by: Jonathan Kim --- .../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4

[PATCH 32/34] drm/amdkfd: add debug device snapshot operation

2023-03-27 Thread Jonathan Kim
for queue and device snapshot. change device snapshot implementation to match queue snapshot implementation. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 72 drivers

[PATCH 16/34] drm/amdkfd: add per process hw trap enable and disable functions

2023-03-27 Thread Jonathan Kim
fw checks. remove asic family name comments. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 148 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 29 + drivers/gpu/drm/amd/amdkfd/kfd_process.c

[PATCH 13/34] drm/amdkfd: prepare map process for single process debug devices

2023-03-27 Thread Jonathan Kim
-by: Jonathan Kim --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 93 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 9 ++ .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h | 5 +- 4 files changed, 111 insertions(+), 1

[PATCH 07/34] drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 116 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +- 3 files changed, 121 insertions(+), 2 deletions(-) diff --git

[PATCH 30/34] drm/amdkfd: add debug query exception info operation

2023-03-27 Thread Jonathan Kim
of clearing the target exception on query. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 120 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 6 ++ 3 files changed, 133

[PATCH 31/34] drm/amdkfd: add debug queue snapshot operation

2023-03-27 Thread Jonathan Kim
buf_size arg to num_queues for clarity. fix minimum entry size calculation. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 + .../drm/amd/amdkfd

[PATCH 17/34] drm/amdkfd: apply trap workaround for gfx11

2023-03-27 Thread Jonathan Kim
application. disable debugging for now on gfx11 due to broken fw. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 - drivers/gpu/drm/amd/amdkfd/kfd_debug.c

[PATCH 14/34] drm/amdgpu: prepare map process for multi-process debug devices

2023-03-27 Thread Jonathan Kim
. v3: remove unneeded comment. also add missing kfd_debug.h include in dqm file. v2: remove asic family code name comment in per vmid support check Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 5 ++ .../drm/amd/amdkfd/kfd_device_queue_manager.c | 51

[PATCH 15/34] drm/amdgpu: expose debug api for mes

2023-03-27 Thread Jonathan Kim
. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 32 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 20 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 12 +++ drivers/gpu/drm/amd/include

[PATCH 23/34] drm/amdkfd: add debug set exceptions enabled operation

2023-03-27 Thread Jonathan Kim
The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 36

[PATCH 24/34] drm/amdkfd: add debug wave launch override operation

2023-03-27 Thread Jonathan Kim
v3: v2 was reviewed but requesting re-review for GFX11 added supported. v2: switch unsupported override mode return from EPERM to EINVAL to support unique EPERM on PTRACE failure. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 47

[PATCH 29/34] drm/amdkfd: add debug query event operation

2023-03-27 Thread Jonathan Kim
Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm

[PATCH 11/34] drm/amdgpu: add gfx11 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
Implement the per-device calls to enable or disable HW debug mode for GFX11. v2: remove unneeded ioctl reference and fix types and comment formats. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 38 +++ 1 file

[PATCH 22/34] drm/amdkfd: update process interrupt handling for debug events

2023-03-27 Thread Jonathan Kim
on queue create during -ERESTARTSYS. fix up macros naming for ECODE parsing. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 16 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 2 + drivers/gpu/drm/amd/amdkfd/Makefile | 1

[PATCH 18/34] drm/amdkfd: add raise exception event function

2023-03-27 Thread Jonathan Kim
-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 104 +++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 7 ++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 +++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 + 4 files changed, 123 insertions(+) diff --git a/drivers/gpu

[PATCH 08/34] drm/amdkfd: fix kfd_suspend_all_processes for gfx941 debugging

2023-03-27 Thread Jonathan Kim
-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 1e3795e7e18d..55a4ddd35e12 100644 --- a/drivers/gpu/drm/amd/amdkfd

[PATCH 06/34] drm/amdgpu: add gfx9 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
lock renaming. add comments to explain ignored arguments for debug trap enable and disable. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 92 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 9 ++ 2 files changed

[PATCH 02/34] drm/amdkfd: display debug capabilities

2023-03-27 Thread Jonathan Kim
. - remove asic family code name comments in firmware support checking - add gfx11 requirements in fw support checks and debug props and caps Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 101 -- drivers/gpu/drm/amd/amdkfd

[PATCH 05/34] drm/amdgpu: setup hw debug registers on driver initialization

2023-03-27 Thread Jonathan Kim
init for gfx11. add trap on wave start and end registers for gfx11. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 26 +++ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 1 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

[PATCH 12/34] drm/amdgpu: add configurable grace period for unmap queues

2023-03-27 Thread Jonathan Kim
. v2: clarify purpose in the description of this patch Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 43 .../drm/amd

[PATCH 09/34] drm/amdgpu: add gfx10 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
will be fixed for GFX11 onwards. Also remove a bunch of deprecated misplaced references for GFX10.3. v2: fix 'boundaray' typo in description and added gfx10 kgd2kfd header to avoid kern bot missing prototype complaint. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../drm/amd/amdgpu

[PATCH 10/34] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-03-27 Thread Jonathan Kim
the required register values that the HWS needs to write on debug enable and disable. v3: fix typo and comment format kern bot complaint. add back cu occupancy that was removed by mistake. v2: add commentary on unused restore_dbg_registers for debug enable. Signed-off-by: Jonathan Kim Reviewed

[PATCH 01/34] drm/amdkfd: add debug and runtime enable interface

2023-03-27 Thread Jonathan Kim
and disable). Also remove non-needed dbg flag option. Add revision and subvendor info to debug device snapshot entry. Add trap on wave start and end override option. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 48 ++ include/uapi/linux/kfd_ioctl.h | 667

[PATCH 04/34] drm/amdgpu: add kgd hw debug mode setting interface

2023-03-27 Thread Jonathan Kim
Introduce the require KGD debug calls that will execute hardware debug mode setting. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/include/kgd_kfd_interface.h | 34 +++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/include

[PATCH 03/34] drm/amdkfd: prepare per-process debug enable and disable

2023-03-27 Thread Jonathan Kim
there's nothing to evict. change err code to EALREADY if attaching to an already attached process. move debug disable to release worker to avoid race with disable from ioctl call. v2: relax debug trap disable and PTRACE ATTACH requirement. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd

[PATCH 09/32] drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

2023-01-25 Thread Jonathan Kim
the required register values that the HWS needs to write on debug enable and disable. v2: add commentary on unused restore_dbg_registers for debug enable. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 43 ++- 1 file changed, 41 insertions(+), 2

[PATCH 31/32] drm/amdkfd: add debug device snapshot operation

2023-01-25 Thread Jonathan Kim
for queue and device snapshot. change device snapshot implementation to match queue snapshot implementation. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 72 drivers/gpu/drm/amd/amdkfd

[PATCH 25/32] drm/amdkfd: add debug suspend and resume process queues operation

2023-01-25 Thread Jonathan Kim
suspend or resume queues). v2: add gfx11/mes support. prevent header copy on suspend from overwriting user fields. simplify resume_queues function. address other nit-picks Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5 + drivers/gpu/drm/amd/amdgpu

[PATCH 18/32] drm/amdkfd: add send exception operation

2023-01-25 Thread Jonathan Kim
. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim --- .../gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 ++ drivers/gpu/drm/amd

[PATCH 16/32] drm/amdkfd: add per process hw trap enable and disable functions

2023-01-25 Thread Jonathan Kim
functions are implemented in a follow up patch. v2: add gfx11 support. fix fw checks. remove asic family name comments. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 148 +- drivers/gpu/drm/amd

[PATCH 26/32] drm/amdkfd: add debug set and clear address watch points operation

2023-01-25 Thread Jonathan Kim
watch points are allocated or not. v3: add gfx11 support. cleanup gfx9 kgd calls to set and clear address watch. use per device spinlock to set watch points. fixup runlist refresh calls on set/clear address watch. v2: change dev_id arg to gpu_id for consistency Signed-off-by: Jonathan Kim

  1   2   3   >