compaction is
disabled. (https://tinyurl.com/4f32f7rs)
Cc: Dave Airlie
Cc: Vlastimil Babka
Cc: Daniel Vetter
Reviewed-by: Christian König
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm
Otherwise the nid is always passed as 0 during memory reclaim so
make TTM shrinkers NUMA aware.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++-
drivers/gpu/drm/ttm/tests/ttm_pool_test.c | 10 +-
drivers/gpu/drm/ttm/ttm_device.c
To maximize per cpu execution context for the work items, use the
recommended settings i.e. WQ_DFL_ACTIVE(256). There is no apparent
reason to throttle to 16 while process tear down.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/ttm/ttm_device.c | 2 +-
1 file changed, 1 insertion(+), 1
-off-by: Rajneesh Bhardwaj
---
Changes in v3:
* Use WQ_UNBOUND to address the warning reported by CI pipeline.
drivers/gpu/drm/ttm/ttm_bo.c | 8 +++-
drivers/gpu/drm/ttm/ttm_device.c | 6 --
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
-off-by: Rajneesh Bhardwaj
---
Changes in v2:
- Absorbed the feedback provided by Christian in the commit message and
the comment.
drivers/gpu/drm/ttm/ttm_bo.c | 8 +++-
drivers/gpu/drm/ttm/ttm_device.c | 3 ++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers
etc.
This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD
APU platforms such as GFXIP9.4.3.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/ttm/ttm_bo.c | 10 +-
drivers/gpu/drm/ttm/ttm_device.c | 3 ++-
2 files changed, 11 insertions(+), 2 deletions(-)
diff
.
Reviewed-by: Christian König .
Signed-off-by: Rajneesh Bhardwaj
---
Changes in v2:
- Updated commit message as per Christian's feedback
drivers/gpu/drm/ttm/ttm_bo.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 326a3d13a829
Early release TTM BOs when the kernel default setting is init_on_free to
wipe out and reinitialize system memory chunks. This could potentially
optimize performance when an application does a lot of malloc/free style
allocations with unified system memory.
Signed-off-by: Rajneesh Bhardwaj
:
7fbcc402d820
Cc: Christian König
Cc: Felix Kuehling
Cc: Alex Deucher
Fixes: 627b92ef9d7c ("drm/amdgpu: Wipe all VRAM on free when RAS is
enabled")
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(
From: David Yat Sin
Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 70 +-
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 272
- Change KFD minor version to 7 for CRIU
Proposed userspace changes:
https://github.com/RadeonOpenCompute/criu
Signed-off-by: Rajneesh Bhardwaj
---
include/uapi/linux/kfd_ioctl.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore
support its possible that the SVM ranges can be resumed on another node
where the actual_gpu_id may not be same as the original (user_gpu_id)
gpu id. So modify svm code to use user_gpu_id.
Signed-off-by: Rajneesh Bhardwaj
the user
ioctl's.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 468 --
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 45 +-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 +
drivers/gpu/drm/amd/amdkfd
During CRIU restore phase, the VMAs for the virtual address ranges are
not at their final location yet so in this stage, only cache the data
required to successfully resume the svm ranges during an imminent CRIU
resume phase.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd
possible values for the max possible attribute
types.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95
drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++
3 files changed, 108 insertions
Recoverable page faults are represented by the xnack mode setting inside
a kfd process and are used to represent the device page faults. For CR,
we don't consider negative values which are typically used for querying
the current xnack mode without modifying it.
Signed-off-by: Rajneesh Bhardwaj
by extending the PROCESS_INFO op of the the
CRIU IOCTL to discover the svm ranges in the target process and a future
patches brings in the required support for checkpoint and restore for
SVM ranges.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++--
drivers
From: David Yat Sin
Checkpoint contents of queue control stacks on CRIU dump and restore them
during CRIU restore.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2
the flags during restore as there might be
some default flags set when the prange is created. Also handle the
invalid PREFETCH atribute values saved during checkpoint by replacing
them with another dummy KFD_IOCTL_SVM_ATTR_SET_FLAGS attribute.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm
ned-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 +
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index ffec25e642e2..d34508f5e88b 100644
--- a/drivers/gpu/drm/
From: David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd
From: David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
.../drm/amd/amdkfd
From: David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.
Signed-off-by: David Yat Sin
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +--
1 file changed, 41 insertions(+), 19 deletions(-)
command submissions.
With sDMA, we see huge improvement in checkpoint and restore operations
compared to the generic pci based access via host data path.
Suggested-by: Felix Kuehling
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71
and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.
(API redesigned by David Yat Sin)
Suggested-by: Felix Kuehling
Reviewed-by: Felix Kuehling
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd
process i.e. criu_resume ioctl op is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh
From: David Yat Sin
Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +-
.../drm
-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 37 +++-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 +
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b
-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++-
1 file changed, 297 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 17a937b7139f
From: David Yat Sin
Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 110
.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 +
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 11 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +
drivers
process. Also the pid of a process
inside a container might be different than its global pid so return
the ns pid.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 +++-
1 file changed, 55 insertions(+), 1 deletion
- Update debug config for Checkpoint-Restore (CR) support
- Also include necessary options for CR with docker containers.
Reviewed-by: Felix Kuehling
Signed-off-by: Rajneesh Bhardwaj
---
arch/x86/configs/rock-dbg_defconfig | 53 ++---
1 file changed, 34 insertions
gpu_id remapping
Rajneesh Bhardwaj (15):
x86/configs: CRIU update debug rock defconfig
drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
drm/amdkfd: CRIU Implement KFD process_info ioctl
drm/amdkfd: CRIU Implement KFD checkpoint ioctl
drm/amdkfd: CRIU Implement KFD restore ioctl
drm/amdkfd
From: David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.
Signed-off-by: David Yat Sin
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +--
1 file changed, 41 insertions(+), 19 deletions(-)
ned-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 17 +
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 88360f23eb61..7c92116153fe 100644
--- a/drivers/gpu/drm/
the user
ioctl's.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 465 --
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 45 +-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 +
drivers/gpu/drm/amd/amdkfd
possible values for the max possible attribute
types.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 95
drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 10 +++
3 files changed, 108 insertions
During CRIU restore phase, the VMAs for the virtual address ranges are
not at their final location yet so in this stage, only cache the data
required to successfully resume the svm ranges during an imminent CRIU
resume phase.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd
From: David Yat Sin
Checkpoint contents of queue control stacks on CRIU dump and restore them
during CRIU restore.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2
Recoverable page faults are represented by the xnack mode setting inside
a kfd process and are used to represent the device page faults. For CR,
we don't consider negative values which are typically used for querying
the current xnack mode without modifying it.
Signed-off-by: Rajneesh Bhardwaj
From: David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.
Signed-off-by: David Yat Sin
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +-
by extending the PROCESS_INFO op of the the
CRIU IOCTL to discover the svm ranges in the target process and a future
patches brings in the required support for checkpoint and restore for
SVM ranges.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 12 +++--
drivers
From: David Yat Sin
Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 70 +-
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 272 ---
command submissions.
With sDMA, we see huge improvement in checkpoint and restore operations
compared to the generic pci based access via host data path.
Suggested-by: Felix Kuehling
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 71
From: David Yat Sin
Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 110 -
- Add rock-rel_defconfig for release builds.
Signed-off-by: Rajneesh Bhardwaj
---
arch/x86/configs/rock-rel_defconfig | 4927 +++
1 file changed, 4927 insertions(+)
create mode 100644 arch/x86/configs/rock-rel_defconfig
diff --git a/arch/x86/configs/rock-rel_defconfig
From: David Yat Sin
Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +-
values to newly
created BOs. This also adds the minimal gpu mapping support for a single
gpu checkpoint restore use case.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 298 ++-
1 file changed, 297 insertions
In CRIU resume stage, resume all the shared virtual memory ranges from
the data stored inside the resuming kfd process during CRIU restore
phase. Also setup xnack mode and free up the resources.
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 +
drivers
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore
support its possible that the SVM ranges can be resumed on another node
where the actual_gpu_id may not be same as the original (user_gpu_id)
gpu id. So modify svm code to use user_gpu_id.
Signed-off-by: Rajneesh Bhardwaj
the pid of a process
inside a container might be different than its global pid so return
the ns pid.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 55 +++-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 +
drivers
From: David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd
process i.e. criu_resume ioctl is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
.
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 3 +-
4
From: David Yat Sin
Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO
op the queues will be stay in an evicted state. Once the plugin is done
draining BO contents, it is safe to perform an UNPAUSE op for the queues
to resume.
Signed-off-by: David Yat Sin
---
- Update debug config for Checkpoint-Restore (CR) support
- Also include necessary options for CR with docker containers.
Signed-off-by: Rajneesh Bhardwaj
---
arch/x86/configs/rock-dbg_defconfig | 53 ++---
1 file changed, 34 insertions(+), 19 deletions(-)
diff --git
privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.
(API redesigned by David Yat Sin)
Suggested-by: Felix Kuehling
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 94
: CRIU checkpoint and restore queue mqds
drm/amdkfd: CRIU checkpoint and restore queue control stack
drm/amdkfd: CRIU checkpoint and restore events
drm/amdkfd: CRIU implement gpu_id remapping
Rajneesh Bhardwaj (15):
x86/configs: CRIU update debug rock defconfig
x86/configs: Add rock
only.
Cc: Felix Kuehling
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
Changes in v2:
* Addressed Christian's concerns for user space impact
* Further reduced the scope to KFD BOs only
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 +++
1 file changed, 3 insertions
in the child which
confuse CRIU when it mmaps on restore. Having this flag set for the
render node VMAs helps. VMAs mapped via KFD already take care of this so
this is needed only for the render nodes.
Cc: Felix Kuehling
Signed-off-by: David Yat Sin
Signed-off-by: Rajneesh Bhardwaj
---
drivers
62 matches
Mail list logo