On 10/4/2024 04:42, Lancelot SIX wrote:
+ * Copyright 2024 Advanced Micro Devices, Inc.
I am not really sure bout the year policy in the kernel, but all the
content here is dated from before 2024. The vast majority is taken form
the cwsr_trap_handler_gfx10.asm file (copyright started in 201
gfx12 derivatives will have substantially different trap handler
implementations from gfx10/gfx11. Add a separate source file for
gfx12+ and remove unneeded conditional code.
No functional change.
Signed-off-by: Jay Cornwall
Cc: Lancelot Six
Cc: Jonathan Kim
---
.../amd/amdkfd
s
register. Both of these fields can assert while the wavefront is
running the trap handler.
Signed-off-by: Jay Cornwall
Cc: Lancelot Six
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 16 +---
.../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 38 ++-
2 files changed, 38
On 5/29/2024 16:07, Lancelot SIX wrote:
On 29/05/2024 20:35, Jay Cornwall wrote:
A wavefront may deallocate its VGPRs at the end of a program while
waiting for memory transactions to complete. If it subsequently
receives a context save exception it will be unable to save,
since this requires
intermittent VM faults under context switching load.
V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters
Signed-off-by: Jay Cornwall
Cc: Lancelot Six
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 695 +-
.../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 17 +
2 files
intermittent VM faults under context switching load.
Signed-off-by: Jay Cornwall
Cc: Lancelot Six
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 744 +-
.../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 13 +
2 files changed, 386 insertions(+), 371 deletions(-)
diff --git a/drivers
On 5/23/2024 13:37, Lancelot SIX wrote:
@@ -622,8 +638,15 @@ L_SAVE_HWREG:
#if ASIC_FAMILY >= CHIP_GFX12
// Ensure no further changes to barrier or LDS state.
+ // STATE_PRIV.BARRIER_COMPLETE may change up to this point.
s_barrier_signal -2
s_barrier_wait -2
+
+ /
ONTEXT,HOST_TRAP} when
restoring this register. Both of these fields can assert while the
wavefront is running the trap handler.
Signed-off-by: Jay Cornwall
Cc: Lancelot Six
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 1191 +
.../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 55
Newer assemblers reject S_WAITCNT. All instances of S_WAITCNT can be
replaced by S_WAITCNT 0 (< gfx12) or S_WAIT_IDLE (>= gfx12) since
there is no concurrency of different memory instruction classes.
Signed-off-by: Jay Cornwall
Cc: Lancelot Six
---
.../gpu/drm/amd/amdkfd/cwsr_trap_han
Source and binary have become mismatched during branch activity.
Signed-off-by: Jay Cornwall
Cc: Lancelot Six
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 57 ---
1 file changed, 24 insertions(+), 33 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd
et:256*2
+ buffer_load_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS
offset:256*3
s_waitcnt vmcnt(0)
end
base-commit: cf743996352e327f483dc7d66606c90276f57380
Reviewed-by: Jay Cornwall
On 3/1/2024 00:35, Kim, Jonathan wrote:
> The range check should probably flag any exception prefixed as
> EC_QUEUE_PACKET_* as valid defined in kfd_dbg_trap_exception_code:
> https://github.com/torvalds/linux/blob/master/include/uapi/linux/kfd_ioctl.h#L857
> + Jay to confirm this is the correct
On 2/23/2024 16:08, Laurent Morichetti wrote:
> Similarly to gfx9, gfx10.1 drops vector stores when an xnack error is
> raised. To work around this issue, use scalar stores instead of vector
> stores when trapsts.xnack_error == 1.
>
> Signed-off-by: Laurent Morichetti
Reviewed-by: Jay Cornwall
s to second-level trap handler".
Besides that:
Reviewed-by: Jay Cornwall
On 1/15/2024 13:07, Jay Cornwall wrote:
> This instruction has no functional difference to S_ENDPGM
> but allows performance counters to track save events correctly.
>
> Signed-off-by: Jay Cornwall
> ---
> drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 14 +++---
This instruction has no functional difference to S_ENDPGM
but allows performance counters to track save events correctly.
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 14 +++---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm | 2 +-
.../gpu
On 1/11/2024 11:25, Kim, Jonathan wrote:
>> This looks OK. The compiler must be warning about a potential problem
>> here, not a definite one.
>>
>> Question for Jon, how does the firmware encode the error code in the
>> context ID? I see these macros:
>>
>> #define KFD_DEBUG_CP_BAD_OP_ECODE_MASK
in restore_process_bos
>
> Signed-off-by: Felix Kuehling
> Acked-by: Christian König
> Tested-by: Emily Deng
> Signed-off-by: Alex Deucher
>
>
> FWIW, I built a plain 6.6 kernel, and was not able to reproduce the
> crash with some simple tests.
>
> Regards,
> Felix
>
>
>>
>> So I agree, let's revert it.
>>
>> Reviewed-by: Jay Cornwall
}
>> +
>> +/* dGPUs: the reserved space for kernel
>> + * before SVM
>> + */
>> +pdd->qpd.cwsr_base = SVM_CWSR_BASE;
>> +pdd->qpd.ib_base = SVM_IB_BASE;
>> }
>>
>> dev_dbg(kfd_device, "node id %u\n", id);
>> --
>> 2.42.0
>>
I saw a segfault issue in Mesa yesterday. Not sure about the others, but I
don't know how to make this change while compatibility with older UMDs.
So I agree, let's revert it.
Reviewed-by: Jay Cornwall
On 11/8/2023 18:23, Laurent Morichetti wrote:
> The trap handler could be entered with pending VALU exceptions, so
> clear the exception state before issuing vector instructions.
>
> Signed-off-by: Laurent Morichetti
Reviewed-by: Jay Cornwall
When MES is oversubscribed it may not frequently check for new
command submissions from driver if the scheduling load is high.
Response latency as high as 5 seconds has been observed.
Enable a flag which adds a check for new commands between
scheduling quantums.
Signed-off-by: Jay Cornwall
Cc
Previously asymptomatic because high 32 bits were zero.
Fixes: 615222cfed20 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole")
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/g
SMEM instructions can reach addresses above 47 bits but require
bit 47 to be sign-extended through bits [63:48].
This allows the TMA to be relocated in a following patch.
Signed-off-by: Jay Cornwall
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 58 ---
.../amd/amdkfd
NULL
access with a small offset.
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 ++--
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index
Some changes have been lost during rebases. Rebuild sources.
Signed-off-by: Jay Cornwall
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 741 +-
1 file changed, 371 insertions(+), 370 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
b/drivers/gpu
On Wed, Jun 30, 2021, at 05:10, YuBiao Wang wrote:
> [Why]
> GPU timing counters are read via KIQ under sriov, which will introduce
> a delay.
>
> [How]
> It could be directly read by MMIO.
>
> v2: Add additional check to prevent carryover issue.
> v3: Only check for carryover for once to prevent
Trap handler is set per-process per-device and is unrelated
to queue management.
Move implementation closer to TMA setup code.
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 22
0 causes instruction fetch stall at cache line boundary under some
conditions on Navi10. A non-zero prefetch is the preferred default
in any case.
Fixes soft hang in Luxmark.
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v10.c | 5 +++--
1 file changed, 3
Trap temporary GPRs are not currently saved/restored on ASICs
without scalar store instructions. They contain data useful to a
user-mode debugger.
Use vector store instructons to save TTMPs on these ASICs.
Signed-off-by: Jay Cornwall
Cc: Laurent Morichetti
---
.../gpu/drm/amd/amdkfd
IB_STS bits are saved/restored in both PC and ttmp11 along different
code paths. Use ttmp11 on both paths to remove redundant code.
Signed-off-by: Jay Cornwall
Cc: Laurent Morichetti
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 764 +-
.../amd/amdkfd
ATC and MTYPE fields do not exist in gfx9 or later.
Signed-off-by: Jay Cornwall
Cc: Laurent Morichetti
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 93 ++-
.../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 28 +-
.../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 30
Save first_wave bit from exec_hi to ttmp1. This allows the high bits
of exec_lo/exec_hi (which hold a 48-bit address) to be cleared in a
follow-up patch.
Signed-off-by: Jay Cornwall
Cc: Laurent Morichetti
---
.../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 596 +-
.../amd
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 8073fcd..9f90448 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm
On Thu, Aug 1, 2019, at 13:47, Alex Deucher wrote:
> From: Jay Cornwall
>
> Following bitmap layout logic introduced by:
> "drm/amdgpu: support get_cu_info for Arcturus".
>
> v2: squash in fixup for gfx_v9_0.c (Alex)
There's a second patch to squash, whic
On Wed, Sep 26, 2018, at 08:53, Christian König wrote:
> This allows us to filter out VM faults in the GMC code.
>
> Signed-off-by: Christian König
The KFD needs to receive notification of unhandled VM faults; when demand
paging is disabled or the address is not pageable. It propagates this to
On Mon, Oct 2, 2017, at 08:22, Kuehling, Felix wrote:
> Is the "new debug trap handler" already working? It seems right now I'm
> breaking the "old" debugger backend test. However, given the current
> status of that debugger, I guess we can disable those tests for now?
>
> Can you speak on behalf
On Tue, Aug 22, 2017, at 16:17, Felix Kuehling wrote:
> Thanks Alex!
>
> Jay, do you think this is enough? This bumps the number of concurrent
> operations on KIQ to 4 by default.
I'm not sure what the best number is. Up to 8 KFD processes is common
(beyond that performance drops off due to VMID
Dead code.
Change-Id: I2383e0b541ed55288570b6a0ec8a0d49cdd4df89
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/radeon/radeon_kfd.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
b/drivers/gpu/drm/radeon/radeon_kfd.c
index 719ea51..8f8c7c1 100644
--- a
oversubscribed runlist.
v2: Remove unused num_mec field to avoid duplicate logic
v3: Separate num_mec removal into separate patches
Change-Id: I9e7bba2cc1928b624e3eeb1edb06fdb602e5294f
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
1 file changed, 1 insertion(+), 1
Dead code.
Change-Id: Ic0bb1bcca87e96bc5e8fa9894727b0de152e8818
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 7 ---
2 files changed, 11 deletions(-)
diff --git a/drivers/gpu/drm/amd
Dead code.
Change-Id: I9575aa73b5741b80dc340f953cc773385c92b2be
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 -
drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 3 ---
2 files changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
oversubscribed runlist.
v2: Remove unused num_mec field to avoid duplicate logic
Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 3 +--
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4
drivers/gpu
On Thu, Jul 13, 2017, at 13:36, Andres Rodriguez wrote:
> On 2017-07-12 02:26 PM, Jay Cornwall wrote:
> > The number of compute queues available to the KFD was erroneously
> > calculated as 64. Only the first MEC can execute compute queues and
> > it has 32 queue slots.
>
oversubscribed runlist.
Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
b/drivers/gpu/drm/amd/amdgpu
On Wed, Jul 12, 2017, at 12:37, Felix Kuehling wrote:
> On 17-07-12 11:59 AM, Alex Deucher wrote:
> > On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling
> > wrote:
> >> Any comments?
> >>
> >> I believe this is a nice stability improvement. In case of VM faults
> >> they don't take down the whole GP
On Wed, Mar 1, 2017, at 16:28, Zeng, Oak wrote:
> COMPUTE_PGM* registers are per pipe per queue - each queue of each pipe
> has a copy of those registers.
COMPUTE_* are ADC registers. These are instantiated once per pipe. The
values they hold corresponds to the most recent values written from the
On Wed, Nov 23, 2016, at 14:27, Alex Deucher wrote:
> It's per queue not per pipe.
Are you sure? I was under the impression that EOP queeus were per-pipe
on Gfx7 and per-queue on Gfx8 onwards (to support context save/restore).
It's also hinted at by the register name (HPD == Hardware Pipe
Descript
On 2016-08-10 11:10, Alex Deucher wrote:
On Wed, Aug 3, 2016 at 2:39 PM, Jay Cornwall wrote:
fence_put was called on an uninitialized variable.
Signed-off-by: Jay Cornwall
Can you commit this internally or do you need one of us to?
Alex
I'm less familiar with the amdgpu branches
On 2016-08-09 11:35, Christian König wrote:
Am 09.08.2016 um 17:49 schrieb Jay Cornwall:
On 2016-08-09 07:52, Christian König wrote:
From: Christian König
We align to 64KB, but when userspace aligns even more we can easily
use more.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd
ag_end != end) {
Would this change not direct larger fragments away from the BigK TLB
partition?
My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an exact
match and not a minimum size. I can't find any immediate documentation
on that t
fence_put was called on an uninitialized variable.
Signed-off-by: Jay Cornwall
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index b11f4e8
51 matches
Mail list logo