[PATCH] drm/amdkfd: Copy HW exception data to user event

2023-11-16 Thread David Yat Sin
Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not have valid data Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd

[PATCH v3] drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3

2023-11-09 Thread David Yat Sin
Change local memory type to MTYPE_UC on revision id 0 Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b

[PATCH v2] drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3

2023-11-08 Thread David Yat Sin
Change local memory type to MTYPE_UC on revision id 0 Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 +--- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b

[PATCH] drm/amdgpu: Change extended-scope MTYPE on gfx v9_4_3

2023-11-07 Thread David Yat Sin
Change local memory type on gfx943 to MTYPE_UC on revision id 0 Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 5 - drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 +--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v3] drm/amdkfd: Add available memory ioctl

2022-06-09 Thread David Yat Sin
returned by this new ioctl is guaranteed to succeed, barring races with other allocating tasks. This IOCTL will be used by libhsakmt: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html Signed-off-by: Daniel Phillips Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu

[PATCH v2] drm/amdkfd: Add available memory ioctl

2022-06-01 Thread David Yat Sin
returned by this new ioctl is guaranteed to succeed, barring races with other allocating tasks. This IOCTL will be used by libhsakmt: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html Signed-off-by: Daniel Phillips Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu

[PATCH] drm/amdkfd: Add available memory ioctl

2022-06-01 Thread David Yat Sin
returned by this new ioctl is guaranteed to succeed, barring races with other allocating tasks. Signed-off-by: Daniel Phillips Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 37 +-- drivers/gpu/drm

[PATCH v3 2/2] drm/amdkfd: CRIU add support for GWS queues

2022-04-18 Thread David Yat Sin
Adding support to checkpoint/restore GWS (Global Wave Sync) queues. Signed-off-by: David Yat Sin Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10 +++--- 2 files changed, 8 insertions

[PATCH v3 1/2] drm/amdkfd: Fix GWS queue count

2022-04-18 Thread David Yat Sin
dqm->gws_queue_count and pdd->qpd.mapped_gws_queue need to be updated each time the queue gets evicted. Fixes: b8020b0304c8 ("drm/amdkfd: Enable over-subscription with >1 GWS queue") Signed-off-by: David Yat Sin Reviewed-by: Felix Kuehling --- .../drm/amd/amdkfd/kfd_dev

[PATCH v2 2/2] drm/amdkfd: CRIU add support for GWS queues

2022-04-18 Thread David Yat Sin
Adding support to checkpoint/restore GWS(Global Wave Sync) queues. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10 +++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff

[PATCH v2 1/2] drm/amdkfd: Fix GWS queue count

2022-04-18 Thread David Yat Sin
dqm->gws_queue_count and pdd->qpd.mapped_gws_queue needs to be updated each time the queue gets evicted. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 83 +-- 1 file changed, 37 insertions(+), 46 deletions(-) diff --git a/drivers/gpu/d

[PATCH 2/2] drm/amdkfd: CRIU add support for GWS queues

2022-04-13 Thread David Yat Sin
Adding support to checkpoint/restore GWS(Global Wave Sync) queues. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 ++-- .../amd/amdkfd/kfd_process_queue_manager.c| 22 ++- 3 files

[PATCH 1/2] drm/amdkfd: Fix GWS queue count

2022-04-13 Thread David Yat Sin
Queue can be inactive during process termination. This would cause dqm->gws_queue_count to not be decremented. There can only be 1 GWS queue per device process so moving the logic out of loop. Signed-off-by: David Yat Sin --- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c|

[PATCH] fixup! drm/amdkfd: CRIU export dmabuf handles for GTT BOs

2022-03-09 Thread David Yat Sin
Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 ++ include/uapi/linux/kfd_ioctl.h | 2 ++ 2 files changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index e1e2362841f8

[PATCH] drm/amdkfd: Set handle to invalid for non GTT/VRAM BOs

2022-03-09 Thread David Yat Sin
Set dmabuf handle to invalid for BOs that cannot be accessed using SDMA during checkpoint/restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 ++-- include/uapi/linux/kfd_ioctl.h | 2 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff

[PATCH v2] drm/amdkfd: CRIU export dmabuf handles for GTT BOs

2022-03-08 Thread David Yat Sin
Export dmabuf handles for GTT BOs so that their contents can be accessed using SDMA during checkpoint/restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 include/uapi/linux/kfd_ioctl.h | 3 ++- 2 files changed, 10 insertions(+), 5

[PATCH] drm/amdkfd: CRIU export dmabuf handles for GTT BOs

2022-03-08 Thread David Yat Sin
Export dmabuf handles for GTT BOs so that their contents can be accessed using SDMA during checkpoint/restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH 1/2] drm/amdkfd: CRIU remove sync and TLB flush on restore

2022-03-08 Thread David Yat Sin
When the process is getting restored, the queues are not mapped yet, so there is no VMID assigned for this process and no TLBs to flush. Signed-off-by: David Yat Sin Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 +--- 1 file changed, 1

[PATCH 2/2] drm/amdkfd: CRIU Refactor restore BO function

2022-03-08 Thread David Yat Sin
Refactor CRIU restore BO to reduce identation before adding support for IPC. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 271 +++ 1 file changed, 129 insertions(+), 142 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b

[PATCH] drm/amdkfd: Fix for possible integer overflow

2022-02-18 Thread David Yat Sin
Fix for possible integer overflow when doing addition. Reported-by: Dan Carpenter Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c

[PATCH v3 11/16] drm/amdkfd: CRIU restore queue doorbell id

2021-09-29 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +-- 1 file changed, 41 insertions(+), 19 deletions(-) diff --git a/drivers

[PATCH v3 15/16] drm/amdkfd: CRIU implement gpu_id remapping

2021-09-29 Thread David Yat Sin
-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 416 --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 9 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 18 + 4 files

[PATCH v3 09/16] drm/amdkfd: CRIU restore queue ids

2021-09-29 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2

[PATCH v3 13/16] drm/amdkfd: CRIU dump/restore queue control stack

2021-09-29 Thread David Yat Sin
Dump contents of queue control stacks on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/amd/amdkfd

[PATCH v3 14/16] drm/amdkfd: CRIU dump and restore events

2021-09-29 Thread David Yat Sin
Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 61 + drivers/gpu/drm/amd/amdkfd/kfd_events.c | 322 +-- drivers/gpu/drm/amd/amdkfd

[PATCH v3 12/16] drm/amdkfd: CRIU dump and restore queue mqds

2021-09-29 Thread David Yat Sin
Dump contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 72

[PATCH v3 07/16] drm/amdkfd: CRIU Implement KFD pause ioctl

2021-09-29 Thread David Yat Sin
Introducing pause IOCTL. The CRIU amdgpu plugin is needs to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures that the queues are not modified between each CRIU dump ioctl. Signed-off-by: David Yat Sin --- drivers

[PATCH v3 16/16] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

2021-09-29 Thread David Yat Sin
) to further process the sdma command submissions. With sDMA, we see huge improvement in checkpoint and restore operations compared to the generic pci based access via host data path. Suggested-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd

[PATCH v3 08/16] drm/amdkfd: CRIU add queues support

2021-09-29 Thread David Yat Sin
Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 25

[PATCH v3 10/16] drm/amdkfd: CRIU restore sdma id for queues

2021-09-29 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +- .../amd/amdkfd

[PATCH v3 04/16] drm/amdkfd: CRIU Implement KFD dumper ioctl

2021-09-29 Thread David Yat Sin
during a restore operation. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 188 ++- drivers/gpu/drm/amd/amdkfd

[PATCH v3 00/16] CHECKPOINT RESTORE WITH ROCm

2021-09-29 Thread David Yat Sin
-Restoring on a different system V1: Initial V2: Addressed review comments V3: Rebased on latest amd-staging-drm-next PS: There will be an upcoming V4 patch series with minor additions to the API's to support HMM. David Yat Sin (9): drm/amdkfd: CRIU Implement KFD pause ioctl drm/amdkfd: CRIU add

[PATCH v3 01/16] x86/configs: CRIU update debug rock defconfig

2021-09-29 Thread David Yat Sin
From: Rajneesh Bhardwaj - Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file

[PATCH v3 05/16] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-09-29 Thread David Yat Sin
a criu process, attach old IDR values to newly created BOs. This also adds the minimal gpu mapping support for a single gpu checkpoint restore use case. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 297 ++- 1 file

[PATCH v3 06/16] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-09-29 Thread David Yat Sin
stage-4 of the restore process i.e. criu_resume ioctl is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Signed-off-by: David Yat Sin Signed-off

[PATCH v3 02/16] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-09-29 Thread David Yat Sin
elevated ptrace attached privileges and CAP_SYS_ADMIN capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesigned by David Yat Sin) Suggested-by: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[PATCH v3 03/16] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-09-29 Thread David Yat Sin
of CRIU dumper process. Also the pid of a process inside a container might be different than its global pid so return the ns pid. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++- drivers/gpu/drm/amd/amdkfd

[PATCH v2 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

2021-08-24 Thread David Yat Sin
) to further process the sdma command submissions. With sDMA, we see huge improvement in checkpoint and restore operations compared to the generic pci based access via host data path. Suggested-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd

[PATCH v2 09/18] drm/amdkfd: CRIU add queues support

2021-08-24 Thread David Yat Sin
Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 25

[PATCH v2 13/18] drm/amdkfd: CRIU dump and restore queue mqds

2021-08-24 Thread David Yat Sin
Dump contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 70

[PATCH v2 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl

2021-08-24 Thread David Yat Sin
during a restore operation. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 1f114a541bd21873de905db64bb9efa673274d4b) (cherry picked from commit 20c435fad57d3201e5402e38ae778f1f0f84a09d) --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++ drivers/gpu

[PATCH v2 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-08-24 Thread David Yat Sin
stage-4 of the restore process i.e. criu_resume ioctl is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Signed-off-by: David Yat Sin Signed-off

[PATCH v2 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-08-24 Thread David Yat Sin
a criu process, attach old IDR values to newly created BOs. This also adds the minimal gpu mapping support for a single gpu checkpoint restore use case. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 47bb685701c336d1fde7e91be93d9cabe89a4c1b) (cherry picked

[PATCH v2 11/18] drm/amdkfd: CRIU restore sdma id for queues

2021-08-24 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +- .../amd/amdkfd

[PATCH v2 10/18] drm/amdkfd: CRIU restore queue ids

2021-08-24 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2

[PATCH v2 01/18] x86/configs: CRIU update release defconfig

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj Update rock-rel_defconfig for monolithic kernel release that enables CRIU support with kfd. Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 4a6d309a82648a23a4fc0add83013ac6db6187d5) Signed-off-by: David Yat Sin --- arch/x86/configs/rock-rel_defconfig | 13

[PATCH v2 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs"

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj This reverts commit 12ebe2b9df192a2a8580cd9ee3e9940c116913c8. This is just a temporary work around and will be dropped later. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++ 1 file changed, 7

[PATCH v2 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-08-24 Thread David Yat Sin
elevated ptrace attached privileges and CAP_SYS_ADMIN capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesign suggested by Felix Kuehling and implemented by David Yat Sin) Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit

[PATCH v2 14/18] drm/amdkfd: CRIU dump/restore queue control stack

2021-08-24 Thread David Yat Sin
Dump contents of queue control stacks on CRIU dump and restore them during CRIU restore. (rajneesh: rebased to 5.11 and fixed merge conflict) Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd

[PATCH v2 12/18] drm/amdkfd: CRIU restore queue doorbell id

2021-08-24 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 +-- 1 file changed, 41 insertions(+), 20 deletions(-) diff --git a/drivers

[PATCH v2 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-08-24 Thread David Yat Sin
) Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 14 3 files changed, 59 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

[PATCH v2 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl

2021-08-24 Thread David Yat Sin
Introducing pause IOCTL. The CRIU amdgpu plugin is needs to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures that the queues are not modified between each CRIU dump ioctl. Signed-off-by: David Yat Sin --- drivers

[PATCH v2 15/18] drm/amdkfd: CRIU dump and restore events

2021-08-24 Thread David Yat Sin
Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 61 + drivers/gpu/drm/amd/amdkfd/kfd_events.c | 322 +-- drivers/gpu/drm/amd/amdkfd

[PATCH v2 16/18] drm/amdkfd: CRIU implement gpu_id remapping

2021-08-24 Thread David Yat Sin
-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 409 --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 18 + 4 files

[PATCH v2 02/18] x86/configs: CRIU update debug rock defconfig

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj - Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file

[PATCH v2 00/18] CHECKPOINT RESTORE WITH ROCm

2021-08-24 Thread David Yat Sin
on a different system V2: Addressed review comments David Yat Sin (9): drm/amdkfd: CRIU Implement KFD pause ioctl drm/amdkfd: CRIU add queues support drm/amdkfd: CRIU restore queue ids drm/amdkfd: CRIU restore sdma id for queues drm/amdkfd: CRIU restore queue doorbell id drm/amdkfd: CRIU dump

[PATCH 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

2021-08-19 Thread David Yat Sin
) to further process the sdma command submissions. With sDMA, we see huge improvement in checkpoint and restore operations compared to the generic pci based access via host data path. Suggested-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd

[PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping

2021-08-19 Thread David Yat Sin
-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 400 +-- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 18 + 4 files

[PATCH 14/18] drm/amdkfd: CRIU dump/restore queue control stack

2021-08-19 Thread David Yat Sin
Dump contents of queue control stacks on CRIU dump and restore them during CRIU restore. (rajneesh: rebased to 5.11 and fixed merge conflict) Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 31 --- drivers/gpu

[PATCH 13/18] drm/amdkfd: CRIU dump and restore queue mqds

2021-08-19 Thread David Yat Sin
Dump contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 53 ++ drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 70

[PATCH 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs"

2021-08-19 Thread David Yat Sin
From: Rajneesh Bhardwaj This reverts commit 12ebe2b9df192a2a8580cd9ee3e9940c116913c8. This is just a temporary work around and will be dropped later. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++ 1 file changed, 7

[PATCH 15/18] drm/amdkfd: CRIU dump and restore events

2021-08-19 Thread David Yat Sin
Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 130 +++- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 253 --- drivers/gpu/drm/amd/amdkfd

[PATCH 12/18] drm/amdkfd: CRIU restore queue doorbell id

2021-08-19 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 +-- 1 file changed, 41 insertions(+), 20 deletions(-) diff --git a/drivers

[PATCH 10/18] drm/amdkfd: CRIU restore queue ids

2021-08-19 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Adding a new private structure queue_restore_data to store queue restore information. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd

[PATCH 11/18] drm/amdkfd: CRIU restore sdma id for queues

2021-08-19 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +- .../amd/amdkfd

[PATCH 09/18] drm/amdkfd: CRIU add queues support

2021-08-19 Thread David Yat Sin
Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 380 ++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl

2021-08-19 Thread David Yat Sin
Introducing pause IOCTL. The CRIU amdgpu plugin is needs to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures that the queues are not modified between each CRIU dump ioctl. Signed-off-by: David Yat Sin --- drivers

[PATCH 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-08-19 Thread David Yat Sin
stage-4 of the restore process i.e. criu_resume ioctl is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Signed-off-by: David Yat Sin Signed-off

[PATCH 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-08-19 Thread David Yat Sin
a criu process, attach old IDR values to newly created BOs. This also adds the minimal gpu mapping support for a single gpu checkpoint restore use case. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 47bb685701c336d1fde7e91be93d9cabe89a4c1b) (cherry picked

[PATCH 02/18] x86/configs: CRIU update debug rock defconfig

2021-08-19 Thread David Yat Sin
From: Rajneesh Bhardwaj - Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file

[PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl

2021-08-19 Thread David Yat Sin
during a restore operation. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 1f114a541bd21873de905db64bb9efa673274d4b) (cherry picked from commit 20c435fad57d3201e5402e38ae778f1f0f84a09d) --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++ drivers/gpu

[PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-08-19 Thread David Yat Sin
elevated ptrace attached privileges and CAP_SYS_ADMIN capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesign suggested by Felix Kuehling and implemented by David Yat Sin) Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit

[PATCH 01/18] x86/configs: CRIU update release defconfig

2021-08-19 Thread David Yat Sin
From: Rajneesh Bhardwaj Update rock-rel_defconfig for monolithic kernel release that enables CRIU support with kfd. Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 4a6d309a82648a23a4fc0add83013ac6db6187d5) Signed-off-by: David Yat Sin --- arch/x86/configs/rock-rel_defconfig | 13

[PATCH 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-08-19 Thread David Yat Sin
) Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 14 3 files changed, 59 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

[PATCH 00/18] CHECKPOINT RESTORE WITH ROCm

2021-08-19 Thread David Yat Sin
on a different system David Yat Sin (9): drm/amdkfd: CRIU Implement KFD pause ioctl drm/amdkfd: CRIU add queues support drm/amdkfd: CRIU restore queue ids drm/amdkfd: CRIU restore sdma id for queues drm/amdkfd: CRIU restore queue doorbell id drm/amdkfd: CRIU dump and restore queue mqds drm/amdkfd