[Patch v2] drm/ttm: Allow direct reclaim to allocate local memory

2024-07-08 Thread Rajneesh Bhardwaj
compaction is disabled. (https://tinyurl.com/4f32f7rs) Cc: Dave Airlie Cc: Vlastimil Babka Cc: Daniel Vetter Reviewed-by: Christian König Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm

[PATCH] drm/ttm: Make ttm shrinkers NUMA aware

2024-04-08 Thread Rajneesh Bhardwaj
Otherwise the nid is always passed as 0 during memory reclaim so make TTM shrinkers NUMA aware. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c

[PATCH] drm/ttm: Implement strict NUMA pool allocations

2024-03-22 Thread Rajneesh Bhardwaj
Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++- drivers/gpu/drm/ttm/tests/ttm_pool_test.c | 10 +- drivers/gpu/drm/ttm/ttm_device.c

[PATCH] drm/ttm: set max_active to recommened default

2023-11-11 Thread Rajneesh Bhardwaj
To maximize per cpu execution context for the work items, use the recommended settings i.e. WQ_DFL_ACTIVE(256). There is no apparent reason to throttle to 16 while process tear down. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_device.c | 2 +- 1 file changed, 1 insertion(+), 1

[Patch v3] drm/ttm: Schedule delayed_delete worker closer

2023-11-11 Thread Rajneesh Bhardwaj
-off-by: Rajneesh Bhardwaj --- Changes in v3: * Use WQ_UNBOUND to address the warning reported by CI pipeline. drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- drivers/gpu/drm/ttm/ttm_device.c | 6 -- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c

[Patch v2] drm/ttm: Schedule delayed_delete worker closer

2023-11-08 Thread Rajneesh Bhardwaj
-off-by: Rajneesh Bhardwaj --- Changes in v2: - Absorbed the feedback provided by Christian in the commit message and the comment. drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- drivers/gpu/drm/ttm/ttm_device.c | 3 ++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers

[PATCH] drm/ttm: Schedule delayed_delete worker closer

2023-11-07 Thread Rajneesh Bhardwaj
etc. This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD APU platforms such as GFXIP9.4.3. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/ttm/ttm_bo.c | 10 +- drivers/gpu/drm/ttm/ttm_device.c | 3 ++- 2 files changed, 11 insertions(+), 2 deletions(-) diff

[Patch v2] drm/ttm: Use init_on_free to delay release TTM BOs

2023-07-07 Thread Rajneesh Bhardwaj
. Reviewed-by: Christian König . Signed-off-by: Rajneesh Bhardwaj --- Changes in v2: - Updated commit message as per Christian's feedback drivers/gpu/drm/ttm/ttm_bo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 326a3d13a829

[PATCH] drm/ttm: Use init_on_free to early release TTM BOs

2023-07-05 Thread Rajneesh Bhardwaj
Early release TTM BOs when the kernel default setting is init_on_free to wipe out and reinitialize system memory chunks. This could potentially optimize performance when an application does a lot of malloc/free style allocations with unified system memory. Signed-off-by: Rajneesh Bhardwaj

[PATCH] drm/amdgpu: Fix recursive locking warning

2022-02-03 Thread Rajneesh Bhardwaj
: 7fbcc402d820 Cc: Christian König Cc: Felix Kuehling Cc: Alex Deucher Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is enabled") Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(

[Patch v5 14/24] drm/amdkfd: CRIU checkpoint and restore events

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 70 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 272

[Patch v5 24/24] drm/amdkfd: Bump up KFD API version for CRIU

2022-02-03 Thread Rajneesh Bhardwaj
- Change KFD minor version to 7 for CRIU Proposed userspace changes: https://github.com/RadeonOpenCompute/criu Signed-off-by: Rajneesh Bhardwaj --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include

[Patch v5 19/24] drm/amdkfd: use user_gpu_id for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore support its possible that the SVM ranges can be resumed on another node where the actual_gpu_id may not be same as the original (user_gpu_id) gpu id. So modify svm code to use user_gpu_id. Signed-off-by: Rajneesh Bhardwaj

[Patch v5 15/24] drm/amdkfd: CRIU implement gpu_id remapping

2022-02-03 Thread Rajneesh Bhardwaj
the user ioctl's. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 468 -- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 45 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 + drivers/gpu/drm/amd/amdkfd

[Patch v5 22/24] drm/amdkfd: CRIU prepare for svm resume

2022-02-03 Thread Rajneesh Bhardwaj
During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an imminent CRIU resume phase. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[Patch v5 21/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2022-02-03 Thread Rajneesh Bhardwaj
possible values for the max possible attribute types. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++ 3 files changed, 108 insertions

[Patch v5 17/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2022-02-03 Thread Rajneesh Bhardwaj
Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we don't consider negative values which are typically used for querying the current xnack mode without modifying it. Signed-off-by: Rajneesh Bhardwaj

[Patch v5 20/24] drm/amdkfd: CRIU Discover svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
by extending the PROCESS_INFO op of the the CRIU IOCTL to discover the svm ranges in the target process and a future patches brings in the required support for checkpoint and restore for SVM ranges. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++-- drivers

[Patch v5 13/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue control stacks on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2

[Patch v5 23/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2022-02-03 Thread Rajneesh Bhardwaj
the flags during restore as there might be some default flags set when the prange is created. Also handle the invalid PREFETCH atribute values saved during checkpoint by replacing them with another dummy KFD_IOCTL_SVM_ATTR_SET_FLAGS attribute. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm

[Patch v5 18/24] drm/amdkfd: CRIU allow external mm for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
ned-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index ffec25e642e2..d34508f5e88b 100644 --- a/drivers/gpu/drm/

[Patch v5 09/24] drm/amdkfd: CRIU restore queue ids

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd

[Patch v5 10/24] drm/amdkfd: CRIU restore sdma id for queues

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd

[Patch v5 11/24] drm/amdkfd: CRIU restore queue doorbell id

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +-- 1 file changed, 41 insertions(+), 19 deletions(-)

[Patch v5 16/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2022-02-03 Thread Rajneesh Bhardwaj
command submissions. With sDMA, we see huge improvement in checkpoint and restore operations compared to the generic pci based access via host data path. Suggested-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71

[Patch v5 02/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2022-02-03 Thread Rajneesh Bhardwaj
and CAP_CHECKPOINT_RESTORE capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesigned by David Yat Sin) Suggested-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[Patch v5 06/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2022-02-03 Thread Rajneesh Bhardwaj
process i.e. criu_resume ioctl op is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh

[Patch v5 12/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm

[Patch v5 07/24] drm/amdkfd: CRIU Implement KFD unpause operation

2022-02-03 Thread Rajneesh Bhardwaj
-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 37 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 + 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b

[Patch v5 05/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2022-02-03 Thread Rajneesh Bhardwaj
-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++- 1 file changed, 297 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 17a937b7139f

[Patch v5 08/24] drm/amdkfd: CRIU add queues support

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 110

[Patch v5 04/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2022-02-03 Thread Rajneesh Bhardwaj
. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 11 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 + drivers

[Patch v5 03/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2022-02-03 Thread Rajneesh Bhardwaj
process. Also the pid of a process inside a container might be different than its global pid so return the ns pid. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 +++- 1 file changed, 55 insertions(+), 1 deletion

[Patch v5 01/24] x86/configs: CRIU update debug rock defconfig

2022-02-03 Thread Rajneesh Bhardwaj
- Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file changed, 34 insertions

[Patch v5 00/24] CHECKPOINT RESTORE WITH ROCm

2022-02-03 Thread Rajneesh Bhardwaj
gpu_id remapping Rajneesh Bhardwaj (15): x86/configs: CRIU update debug rock defconfig drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs drm/amdkfd: CRIU Implement KFD process_info ioctl drm/amdkfd: CRIU Implement KFD checkpoint ioctl drm/amdkfd: CRIU Implement KFD restore ioctl drm/amdkfd

[Patch v4 12/24] drm/amdkfd: CRIU restore queue doorbell id

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +-- 1 file changed, 41 insertions(+), 19 deletions(-)

[Patch v4 19/24] drm/amdkfd: CRIU allow external mm for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
ned-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 88360f23eb61..7c92116153fe 100644 --- a/drivers/gpu/drm/

[Patch v4 16/24] drm/amdkfd: CRIU implement gpu_id remapping

2021-12-22 Thread Rajneesh Bhardwaj
the user ioctl's. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 465 -- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 45 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 + drivers/gpu/drm/amd/amdkfd

[Patch v4 22/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2021-12-22 Thread Rajneesh Bhardwaj
possible values for the max possible attribute types. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95 drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++ 3 files changed, 108 insertions

[Patch v4 23/24] drm/amdkfd: CRIU prepare for svm resume

2021-12-22 Thread Rajneesh Bhardwaj
During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an imminent CRIU resume phase. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd

[Patch v4 14/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue control stacks on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2

[Patch v4 18/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2021-12-22 Thread Rajneesh Bhardwaj
Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we don't consider negative values which are typically used for querying the current xnack mode without modifying it. Signed-off-by: Rajneesh Bhardwaj

[Patch v4 11/24] drm/amdkfd: CRIU restore sdma id for queues

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +-

[Patch v4 21/24] drm/amdkfd: CRIU Discover svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
by extending the PROCESS_INFO op of the the CRIU IOCTL to discover the svm ranges in the target process and a future patches brings in the required support for checkpoint and restore for SVM ranges. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++-- drivers

[Patch v4 15/24] drm/amdkfd: CRIU checkpoint and restore events

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 70 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 272 ---

[Patch v4 17/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2021-12-22 Thread Rajneesh Bhardwaj
command submissions. With sDMA, we see huge improvement in checkpoint and restore operations compared to the generic pci based access via host data path. Suggested-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71

[Patch v4 09/24] drm/amdkfd: CRIU add queues support

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 110 -

[Patch v4 02/24] x86/configs: Add rock-rel_defconfig for amd-feature-criu branch

2021-12-22 Thread Rajneesh Bhardwaj
- Add rock-rel_defconfig for release builds. Signed-off-by: Rajneesh Bhardwaj --- arch/x86/configs/rock-rel_defconfig | 4927 +++ 1 file changed, 4927 insertions(+) create mode 100644 arch/x86/configs/rock-rel_defconfig diff --git a/arch/x86/configs/rock-rel_defconfig

[Patch v4 13/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +-

[Patch v4 06/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-12-22 Thread Rajneesh Bhardwaj
values to newly created BOs. This also adds the minimal gpu mapping support for a single gpu checkpoint restore use case. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++- 1 file changed, 297 insertions

[Patch v4 24/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2021-12-22 Thread Rajneesh Bhardwaj
In CRIU resume stage, resume all the shared virtual memory ranges from the data stored inside the resuming kfd process during CRIU restore phase. Also setup xnack mode and free up the resources. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 + drivers

[Patch v4 20/24] drm/amdkfd: use user_gpu_id for svm ranges

2021-12-22 Thread Rajneesh Bhardwaj
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore support its possible that the SVM ranges can be resumed on another node where the actual_gpu_id may not be same as the original (user_gpu_id) gpu id. So modify svm code to use user_gpu_id. Signed-off-by: Rajneesh Bhardwaj

[Patch v4 04/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-12-22 Thread Rajneesh Bhardwaj
the pid of a process inside a container might be different than its global pid so return the ns pid. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 55 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 + drivers

[Patch v4 10/24] drm/amdkfd: CRIU restore queue ids

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd

[Patch v4 07/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-12-22 Thread Rajneesh Bhardwaj
process i.e. criu_resume ioctl is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj

[Patch v4 05/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2021-12-22 Thread Rajneesh Bhardwaj
. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 + drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 3 +- 4

[Patch v4 08/24] drm/amdkfd: CRIU Implement KFD unpause operation

2021-12-22 Thread Rajneesh Bhardwaj
From: David Yat Sin Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO op the queues will be stay in an evicted state. Once the plugin is done draining BO contents, it is safe to perform an UNPAUSE op for the queues to resume. Signed-off-by: David Yat Sin ---

[Patch v4 01/24] x86/configs: CRIU update debug rock defconfig

2021-12-22 Thread Rajneesh Bhardwaj
- Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Signed-off-by: Rajneesh Bhardwaj --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file changed, 34 insertions(+), 19 deletions(-) diff --git

[Patch v4 03/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-12-22 Thread Rajneesh Bhardwaj
privileges and CAP_CHECKPOINT_RESTORE capabilities attached with the file descriptors so modify KFD to allow such calls. (API redesigned by David Yat Sin) Suggested-by: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 94

[Patch v4 00/24] CHECKPOINT RESTORE WITH ROCm

2021-12-22 Thread Rajneesh Bhardwaj
: CRIU checkpoint and restore queue mqds drm/amdkfd: CRIU checkpoint and restore queue control stack drm/amdkfd: CRIU checkpoint and restore events drm/amdkfd: CRIU implement gpu_id remapping Rajneesh Bhardwaj (15): x86/configs: CRIU update debug rock defconfig x86/configs: Add rock

[Patch v2] drm/amdgpu: Don't inherit GEM object VMAs in child process

2021-12-10 Thread Rajneesh Bhardwaj
only. Cc: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- Changes in v2: * Addressed Christian's concerns for user space impact * Further reduced the scope to KFD BOs only drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 +++ 1 file changed, 3 insertions

[PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2021-12-08 Thread Rajneesh Bhardwaj
in the child which confuse CRIU when it mmaps on restore. Having this flag set for the render node VMAs helps. VMAs mapped via KFD already take care of this so this is needed only for the render nodes. Cc: Felix Kuehling Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers