On 2024-06-05 05:14, Christian König wrote:
Am 04.06.24 um 20:08 schrieb Felix Kuehling:
On 2024-06-03 22:13, Al Viro wrote:
Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into
descriptor table, only to have it looked up by file descriptor and
remove it from descriptor
case left is
SVM and that is most likely not recoverable in any way when VRAM is
lost.
I agree. The series is
Acked-by: Felix Kuehling
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 -
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 87
-off-by: Yunxiang Li
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
b/drivers/gpu/drm/amd/amdkfd
values to this function. I don't think C compilers are that strict. You
could pass a random integer to the function. That said, this function
only has two callers, and both of them use a proper enum value.
Signed-off-by: Jesse Zhang
Acked-by: Felix Kuehling
---
drivers/gpu/drm/amd
On 2024-06-03 04:49, Jesse Zhang wrote:
idr_for_each_entry can ensure that mem is not empty during the loop.
So don't need check mem again.
Signed-off-by: Jesse Zhang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 -
1 file changed, 5 deletions
28 12:20:12 2023 -0400
drm/amdkfd: drop IOMMUv2 support
Now that we use the dGPU path for all APUs, drop the
IOMMUv2 support.
v2: drop the now unused queue manager functions for gfx7/8 APUs
Reviewed-by: Felix Kuehling
Acked-by: Christian König
Tested-by: Mike
references into the corresponding
slots of descriptor table, or drop all those file references and
free the unused descriptors.
Signed-off-by: Al Viro
Thank you for the patches and the explanation. One minor nit-pick
inline. With that fixed, this patch is
Reviewed-by: Felix Kuehling
I can
the descriptor table alone.
Signed-off-by: Al Viro
This patch looks good to me on the amdgpu side. For the DRM side I'm
adding dri-devel.
Acked-by: Felix Kuehling
---
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index
384).
if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
return -EINVAL;
Ok, I missed that. Makes sense. Maybe mention it in the commit
description that kfd_create_vcrat_image_gpu itself checks the avail_size
at the start. The patch is
Reviewed-by: Felix Ku
On 2024-05-30 22:51, Jesse Zhang wrote:
To fix the warning about unused value,
remove the use_static and use the parameter is_static directly.
Signed-off-by: Jesse Zhang
Suggested-by: Felix Kuehling
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
e alu ops for gfx12")
Signed-off-by: Jesse Zhang
Suggested-by: Felix Kuehling
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
b/drivers/gpu/drm/
On 2024-05-31 2:52, Christian König wrote:
> Am 31.05.24 um 00:02 schrieb Felix Kuehling:
>> On 2024-05-28 13:23, Yunxiang Li wrote:
>>> These functions are missing the lock for reset domain.
>>>
>>> Signed-off-by: Yunxiang Li
>>> ---
&
On 2024-05-28 13:23, Yunxiang Li wrote:
These functions are missing the lock for reset domain.
Signed-off-by: Yunxiang Li
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 4 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++--
On 2024-05-29 23:49, Jesse Zhang wrote:
The varible uncached set false, the condition uncached cannot be true.
So remove the dead code, mapping flags will set the flag AMDGPU_VM_MTYPE_UC in
else.
Signed-off-by: Jesse Zhang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd
On 2024-05-30 10:12, Christian König wrote:
Am 30.05.24 um 05:50 schrieb Jesse Zhang:
To fix the warning about unused value, comment out the variable
use_static.
Commenting out variables with // will just get you another warning
from checkpatch.
Christian.
Signed-off-by: Jesse Zhang
On 2024-05-29 23:47, Jesse Zhang wrote:
the expression caps | HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED
is always 1/true regardless of the values of its operand.
Signed-off-by: Jesse Zhang
Please add a Fixes tag. I think this is the commit that introduced the
problem:
On 2024-05-29 23:47, Jesse Zhang wrote:
When copying the information from the user fails, it will goto exit.
But the variable i remains at 0, and do i-- will overflow.
i-- may underflow, but the loop will still exit. Why is the underflow a
problem?
Signed-off-by: Jesse Zhang
---
On 2024-05-29 23:48, Jesse Zhang wrote:
If the rewind flag is set, it should return the final result of
setting mes debug mode or refresh the run list.
No. We're rewinding because an error occurred. We want to return that
error, not the success probably returned by refreshing the runlist.
On 2024-05-29 23:50, Jesse Zhang wrote:
Since the value of avail_size is at least VCRAT_SIZE_FOR_GPU(16384),
minus struct crat_header(40UL) and struct crat_subtype_compute(40UL) it cannot
be less than 0.
Signed-off-by: Jesse Zhang
---
drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 6 --
1
pts at HW access that
detect an error or time out, which may get the HW into a worse state or delay
the actual reset.
At a minimum, I'd recommend testing this with /sys/kernel/debug/hang_hws on a
pre-MES GPU, while some ROCm workload is running.
Reviewed-by: Felix Kuehling
> ---
> driv
-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index d62cfa4e2d2b..2c9a0aa41e2d 100644
--- a/drivers/gpu/drm/amd/amdgpu
code.
v2: clean up a few more places (Lang)
Signed-off-by: Alex Deucher
This is a lot cleaner, thanks. I was looking for something like this
when I reviewed the original patch but missed it. I found it now in
amdgpu_discovery_set_ip_blocks (I think).
Acked-by: Felix Kuehling
---
On 2024-05-20 5:14, Shane Xiao wrote:
> This patch changes the implementation of AMDGPU_PTE_MTYPE_GFX12,
> clear the bits before setting the new one.
> This fixed the potential issue that GFX12 setting memory to NC.
>
> v2: Clear mtype field before setting the new one (Alex)
>
> Signed-off-by:
415,6 +415,10 @@ static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo
> *bo, uint32_t domain,
>"Called with userptr BO"))
> return -EINVAL;
>
> + /* bo has been pined, not need validate it */
pined -> pinned
With those typos fixed,
Fixes tag. It should be a
single line and no single quotes. Other than that, the patch is
Reviewed-by: Felix Kuehling
Signed-off-by: Sreekant Somasekharan
---
drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/
Changed commit header to reflect the above
v3: Use crc16 as suggested-by: Lijo Lazar
Ensure that gpu_id != 0
Signed-off-by: Harish Kasiviswanathan
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 40 +++
1 file changed, 34
On 2024-05-07 07:58, Thomas Zimmermann wrote:
Implement struct drm_client_funcs with the respective helpers and
remove the custom code from the emulation. The generic helpers are
equivalent in functionality.
Signed-off-by: Thomas Zimmermann
---
drivers/gpu/drm/radeon/radeon_fbdev.c | 66
On 2024-05-06 17:10, Harish Kasiviswanathan wrote:
On 2024-05-06 16:30, Felix Kuehling wrote:
On 2024-05-03 18:06, Harish Kasiviswanathan wrote:
gpu_id needs to be unique for user space to identify GPUs via KFD
interface. In the current implementation there is a very small
probability
On 2024-05-03 18:06, Harish Kasiviswanathan wrote:
gpu_id needs to be unique for user space to identify GPUs via KFD
interface. In the current implementation there is a very small
probability of having non unique gpu_ids.
v2: Add check to confirm if gpu_id is unique. If not unique, find one
/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -45,6 +45,7 @@
Can you remove #include and "amdgpu_dma_buf.h" here?
Or is it still needed by something else left in kfd_chardev.c?
Other than that, this patch is
Reviewed-by: Felix Kuehling
#include "k
urn EAGAIN to application if
hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call
ioctl again.
Change EAGAIN to debug message as this is not error.
Signed-off-by: Philip Yang
Assuming this passes your stress testing without CPU stall warnings,
this patch is
Reviewed-by: Fe
On 2024-04-16 10:04, Tvrtko Ursulin wrote:
>
> On 01/04/2024 18:58, Felix Kuehling wrote:
>>
>> On 2024-04-01 12:56, Tvrtko Ursulin wrote:
>>>
>>> On 01/04/2024 17:37, Felix Kuehling wrote:
>>>> On 2024-04-01 11:09, Tvrtko Ursulin wrote:
>&
-by: Tvrtko Ursulin
Reviewed-by: Christian König # v1
Reviewed-by: Felix Kuehling # v2
I'm waiting for Christian to review patches 1 and 3. Then I can apply
the whole series.
Regards,
Felix
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 3 +--
drivers/gpu/drm/amd/amdgpu
On 2024-05-01 21:08, Harish Kasiviswanathan wrote:
> No functional change. This will help in moving gpu_id creation to next
> step while still being able to identify the correct GPU
>
> Signed-off-by: Harish Kasiviswanathan
Reviewed-by: Felix Kuehling
> ---
> drivers/
On 2024-05-01 21:08, Harish Kasiviswanathan wrote:
> gpu_id needs to be unique for user space to identify GPUs via KFD
> interface. Do a single pass search to detect collision. If
> detected, increment gpu_id by one.
>
> Probability of collisons are very rare. Hence, no more complexity is
>
-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 386875e6eb96..481cb958e165 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
On 2024-05-01 14:34, Felix Kuehling wrote:
On 2024-04-30 19:29, Ramesh Errabolu wrote:
Analysis of code by Coverity, a static code analyser, has identified
a resource leak in the symbol hmm_range. This leak occurs when one of
the prior steps before it is released encounters an error
On 2024-04-30 19:29, Ramesh Errabolu wrote:
Analysis of code by Coverity, a static code analyser, has identified
a resource leak in the symbol hmm_range. This leak occurs when one of
the prior steps before it is released encounters an error.
Signed-off-by: Ramesh Errabolu
---
hen device and host can effectively share system memory.
v2: Report local_mem_size_private as 0. (Felix)
Signed-off-by: Lang Yu
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 5 +
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 20 ++-
Reviewed-by: Christian König # v1
Reviewed-by: Felix Kuehling
I also ran kfdtest on a multi-GPU system just to make sure this didn't
break our multi-GPU support. BTW, I had to fix up some things when I
tried to apply your patch to the current amd-staging-drm-next branch.
That branch
offset VMEM_MODIFIERS
offset:256*3
s_waitcnt vmcnt(0)
end
base-commit: cf743996352e327f483dc7d66606c90276f57380
Reviewed-by: Jay Cornwall
Acked-by: Felix Kuehling
Do you need me to submit the patch to amd-staging-drm-next?
Thanks,
Felix
On 2024-04-29 06:38, Yu, Lang wrote:
[Public]
-Original Message-
From: Kuehling, Felix
Sent: Saturday, April 27, 2024 6:45 AM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Yang, Philip ; Koenig, Christian
; Zhang, Yifan ; Liu,
Aaron
Subject: Re: [PATCH 2/2] drm/amdkfd: Allow
On 2024-04-29 5:43, Tvrtko Ursulin wrote:
On 26/04/2024 23:24, Felix Kuehling wrote:
On 2024-04-26 12:43, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin
When commit b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible
SG BOs") added a new TTM region it missed to notice the
On 2024-04-29 9:45, Tvrtko Ursulin wrote:
On 29/04/2024 12:11, Christian König wrote:
Am 29.04.24 um 11:43 schrieb Tvrtko Ursulin:
On 26/04/2024 23:24, Felix Kuehling wrote:
On 2024-04-26 12:43, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin
When commit b453e42a6e8b ("drm/amdgpu: Ad
On 2024-04-26 04:37, Lang Yu wrote:
Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
memory allocation requirements.
We can't even run a Basic MNIST Example with a default 512MB carveout.
On 2024-04-26 04:37, Lang Yu wrote:
The default ttm_tt_pages_limit is 1/2 of system memory.
It is prone to out of memory with such a configuration.
Indiscriminately allowing the violation of all memory limits is not a
good solution. It will lead to poor performance once you actually reach
quot;drm/amdgpu: Add new placement for preemptible SG BOs")
Cc: Felix Kuehling
Cc: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/am
-off-by: Lancelot SIX
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 58c1fe542193..451bb058cc62 100644
--- a/drivers
this doesn't break existing user mode. It only makes it fail in
a more obvious way. If that's the case, the patch is
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b
The series is
Reviewed-by: Felix Kuehling
On 2024-04-24 11:27, Philip Yang wrote:
This patch series implement new KFD memory alloc flag for best effort contiguous
VRAM allocation, to support peer direct access RDMA device with limited
scatter-gather
dma capability.
v2: rebase on patch (&quo
On 2024-04-23 11:28, Philip Yang wrote:
RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
contiguous VRAM, allow
TTM evict KFD BOs from the same process, this will evict the user queues
first, and restore the queues later after contiguous VRAM allocation.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++-
1 file changed
On 2024-04-23 11:28, Philip Yang wrote:
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.
Signed-off-by: Philip Yang
---
On 2024-04-23 11:28, Philip Yang wrote:
TTM allocate contiguous VRAM may takes more than 1 second to evict BOs
for larger size RDMA buffer. Because KFD restore bo worker reserves all
KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them,
this causes TTM failed to alloc
] ? __pfx_kthread+0x10/0x10
[ 57.794184] ret_from_fork_asm+0x1b/0x30
Signed-off-by: Lang Yu
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
On 2024-04-23 14:56, Mukul Joshi wrote:
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.
Signed-off-by: Mukul Joshi
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu
On 2024-04-23 01:50, Christian König wrote:
Am 22.04.24 um 21:45 schrieb Yunxiang Li:
Reset request from KFD is missing a check for if a reset is already in
progress, this causes a second reset to be triggered right after the
previous one finishes. Add the check to align with the other reset
reset sources.
Acked-by: Alex Deucher
Reviewed-by: Felix Kuehling
Signed-off-by: Yunxiang Li
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
b/drivers/gpu/drm/amd/amdgpu
.
Signed-off-by: Mukul Joshi
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 16 +++-
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
b/drivers/gpu
Handle the case that the restore worker was already scheduled by another
eviction while the restore was in progress.
Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +++---
1 file
separate loops in amdgpu_amdkfd_restore_process_bos. (Felix)
1.Validate BOs
2.Validate VM (and DMABuf attachments)
3.Update page tables for the BOs validated above
Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs")
Signed-off-by: Lang Yu
Reviewed
Make SVM BOs more likely to get evicted than other BOs. These BOs
opportunistically use available VRAM, but can fall back relatively
seamlessly to system memory. It also avoids SVM migrations evicting
other, more important BOs as they will evict other SVM allocations
first.
Signed-off-by: Felix
+ "MES_SCH_API_UPDATE_ROOT_PAGE_TABLE",
+ "MES_SCH_API_AMD_LOG",
Maybe drop the prefixes. They don't add any information value and only
bloat the log messages and module binary size. Other than that, the patch is
Acked-by: Felix Kuehling
+};
+
+static const
Handle case that dma_fence_get_rcu_safe returns NULL.
If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.
Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Feli
On 2024-04-16 13:02, Chen, Xiaogang wrote:
On 4/15/2024 2:49 PM, Felix Kuehling wrote:
Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.
make rock-dbg_defconfig
make savedefconfig
cp defconfig arch/x86
alling dma_fence_signal and dma_fence_put with zero fences to rely
on checking parameters in DMA API.
Cc: Alex Deucher
Cc: Christian Koenig
Cc: Xiaogang Chen
Cc: Felix Kuehling
Signed-off-by: Vitaly Prosyak
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 ++
1 file changed, 6 insertions
make rock-dbg_defconfig
make savedefconfig
cp defconfig arch/x86/config/rock-dbg_defconfig
This also enables UBSAN, which can help catch some types of bugs at
compile time.
Signed-off-by: Felix Kuehling
---
arch/x86/configs/rock-dbg_defconfig | 46 +
1 file changed
Fix memory leak due to a leaked mmget reference on an error handling
code path that is triggered when attempting to create KFD processes
while a GPU reset is in progress.
Fixes: 0ab2d7532b05 ("drm/amdkfd: prepare per-process debug enable and disable")
CC: Xiaogang Chen
Signed-off
On 2024-04-08 3:55, Christian König wrote:
Am 07.04.24 um 06:52 schrieb Lang Yu:
When VM is in evicting state, amdgpu_vm_update_range would return
-EBUSY.
Then restore_process_worker runs into a dead loop.
Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in
compute VMs")
. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.
Signed-off-by: Zhigang Luo
This patch is
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 17 ++---
1 file changed, 6 insertions(+), 11 deletions(-)
diff
On 2024-04-01 17:53, Zhigang Luo wrote:
If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the
On 2024-04-01 12:56, Tvrtko Ursulin wrote:
On 01/04/2024 17:37, Felix Kuehling wrote:
On 2024-04-01 11:09, Tvrtko Ursulin wrote:
On 28/03/2024 20:42, Felix Kuehling wrote:
On 2024-03-28 12:03, Tvrtko Ursulin wrote:
Hi Felix,
I had one more thought while browsing around the amdgpu CRIU
On 2024-04-01 11:09, Tvrtko Ursulin wrote:
On 28/03/2024 20:42, Felix Kuehling wrote:
On 2024-03-28 12:03, Tvrtko Ursulin wrote:
Hi Felix,
I had one more thought while browsing around the amdgpu CRIU plugin.
It appears it relies on the KFD support being compiled in and
/dev/kfd present
a few weeks.
Regards,
Felix
Regards,
Tvrtko
On 15/03/2024 18:36, Tvrtko Ursulin wrote:
On 15/03/2024 02:33, Felix Kuehling wrote:
On 2024-03-12 5:45, Tvrtko Ursulin wrote:
On 11/03/2024 14:48, Tvrtko Ursulin wrote:
Hi Felix,
On 06/12/2023 21:23, Felix Kuehling wrote:
Executive
, the patch is
Reviewed-by: Felix Kuehling
+ /* VF MMIO access (except mailbox range) from CPU
+* will be blocked during sriov runtime
+*/
+ adev->virt.caps |= AMDGPU_VF_MMIO_ACCESS_PROTECT;
+
amdgpu_gmc_noretry_set(adev);
On 2024-03-26 11:52, Alex Deucher wrote:
This adds allocation latency, but aligns better with user
expectations. The latency should improve with the drm buddy
clearing patches that Arun has been working on.
If we submit this before the clear-page-tracking patches are in, this
will cause
On 2024-03-25 19:33, Liu, Shaoyun wrote:
[AMD Official Use Only - General]
It can cause page fault when the log size exceed the page size .
I'd consider that a breaking change in the firmware that should be
avoided. Is there a way the updated driver can tell the FW the log size
that
On 2024-03-26 12:04, Alam, Dewan wrote:
[AMD Official Use Only - General]
Looping in +@Zhang, Zhaochen
CAM control register can only be written by PF. VF can only read the register.
In SRIOV VF, the write won't work.
In SRIOV case, CAM's enablement is controlled by the host. Hence, we think
On 2024-03-26 10:53, Philip Yang wrote:
On 2024-03-25 14:45, Felix Kuehling wrote:
On 2024-03-22 15:57, Zhigang Luo wrote:
it will cause page fault after device recovered if there is a
process running.
Signed-off-by: Zhigang Luo
Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd
On 2024-03-22 12:49, shaoyunl wrote:
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed
What happens when you run the new firmware on an old kernel that only
allocates 4KB?
Regards,
Felix
Signed-off-by:
On 2024-03-22 15:57, Zhigang Luo wrote:
it will cause page fault after device recovered if there is a process running.
Signed-off-by: Zhigang Luo
Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
1 file changed, 2 insertions(+)
diff
On 2024-03-20 18:52, Mukul Joshi wrote:
Destroy the high priority workqueue that handles interrupts
during KFD node cleanup.
Signed-off-by: Mukul Joshi
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 2 ++
1 file changed, 2 insertions(+)
diff --git
Tested-by: Jesse Zhang
Reviewed-by: Felix Kuehling
---
.../gpu/drm/amd/amdkfd/kfd_int_process_v10.c| 3 ++-
.../gpu/drm/amd/amdkfd/kfd_int_process_v11.c| 3 ++-
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 3 ++-
include/uapi/linux/kfd_ioctl.h | 17
On 2024-03-18 16:12, Felix Kuehling wrote:
On 2024-03-15 14:17, Mukul Joshi wrote:
Check cgroup permissions when returning DMA-buf info and
based on cgroup check return the id of the GPU that has
access to the BO.
Signed-off-by: Mukul Joshi
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4
On 2024-03-20 15:09, Joshi, Mukul wrote:
[AMD Official Use Only - General]
-Original Message-
From: Kuehling, Felix
Sent: Monday, March 18, 2024 4:13 PM
To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdkfd: Check cgroup when returning DMABuf info
On
On 2024-03-15 14:17, Mukul Joshi wrote:
Check cgroup permissions when returning DMA-buf info and
based on cgroup check return the id of the GPU that has
access to the BO.
Signed-off-by: Mukul Joshi
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
1 file changed, 2 insertions(+), 2
On 2024-03-15 7:37, Christian Göttsche wrote:
Use the new added capable_any function in appropriate cases, where a
task is required to have any of two capabilities.
Reorder CAP_SYS_ADMIN last.
Signed-off-by: Christian Göttsche
Acked-by: Alexander Gordeev (s390 portion)
Acked-by: Felix
On 2024-03-12 5:45, Tvrtko Ursulin wrote:
On 11/03/2024 14:48, Tvrtko Ursulin wrote:
Hi Felix,
On 06/12/2023 21:23, Felix Kuehling wrote:
Executive Summary: We need to add CRIU support to DRM render nodes
in order to maintain CRIU support for ROCm application once they
start relying
int32_t inst)
+{
+ if (doorbell_id) {
+ struct device *dev = node->adev->dev;
+
+ if (KFD_GC_VERSION(node) == IP_VERSION(9, 4, 3))
Could this be made more generic? E.g.:
if (node->adev->xcp_mgr && node->adev->xcp_mgr->num_x
On 2024-03-11 11:14, Sasha Levin wrote:
From: Prike Liang
[ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
Currently, GPU resets can now be performed successfully on the Raven
series. While GPU reset is required for the S3 suspend abort case.
So now can enable gpu reset for S3
On 2024-03-13 5:41, Lijo Lazar wrote:
Check if the device is present in the bus before trying to recover. It
could be that device itself is lost from the bus in some hang
situations.
Signed-off-by: Lijo Lazar
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++
1
On 2024-03-13 13:43, Dewan Alam wrote:
IH Retry CAM should be enabled by register reads instead of always being set to
true.
This explanation sounds odd. Your code is still writing the register
first. What's the reason for reading back the register? I assume it's
not needed for enabling the
causes VM clear to SDMA
before SDAM init. Adding the condition to in drm client creation, on top of v1,
to guard against drm client creation call multiple times.
Signed-off-by: Ahmad Rehman
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 4 ++--
drivers/gpu/drm
On 2024-03-11 12:33, Christian König wrote:
Am 11.03.24 um 16:33 schrieb Felix Kuehling:
On 2024-03-11 11:25, Joshi, Mukul wrote:
[AMD Official Use Only - General]
-Original Message-
From: Christian König
Sent: Monday, March 11, 2024 2:50 AM
To: Joshi, Mukul ; amd-gfx
On 2024-03-11 11:25, Joshi, Mukul wrote:
[AMD Official Use Only - General]
-Original Message-
From: Christian König
Sent: Monday, March 11, 2024 2:50 AM
To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix
Subject: Re: [PATCH] drm/amdgpu: Handle duplicate BOs during
e log. validation can fail
intermittently and rescheduling the worker is there to handle it.
With that fixed, the patch is
Reviewed-by: Felix Kuehling
goto validate_map_fail;
+ }
/* Update mappings not managed by KFD */
list_for_each_entry(peer_vm, _info->vm_list_head,
On 2024-03-07 1:39, Sharma, Shashank wrote:
On 07/03/2024 00:54, Felix Kuehling wrote:
On 2024-03-06 09:41, Shashank Sharma wrote:
From: Christian König
The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out
(f->dependency) in tlb_fence_work (Christian)
- move the misplaced fence_create call to the end (Philip)
V5: - free the f->dependency properly (Christian)
Cc: Christian Koenig
Cc: Felix Kuehling
Cc: Rajneesh Bhardwaj
Cc: Alex Deucher
Reviewed-by: Shashank Sharma
Signed-off-by:
On 2024-03-05 14:49, Dhume, Samir wrote:
[AMD Official Use Only - General]
-Original Message-
From: Kuehling, Felix
Sent: Monday, March 4, 2024 6:47 PM
To: Dhume, Samir ; amd-gfx@lists.freedesktop.org
Cc: Lazar, Lijo ; Wan, Gavin ;
Liu, Leo ; Deucher, Alexander
Subject: Re: [PATCH
nly memory, instead of having to be dynamically
allocated at boot time.
Cc: Greg Kroah-Hartman
Suggested-by: Greg Kroah-Hartman
Signed-off-by: Ricardo B. Marliere
The patch looks good to me. Do you want me to apply this to Alex's
amd-staging-drm-next?
Reviewed-by: Felix Kuehling
---
d
1 - 100 of 3319 matches
Mail list logo