Re: [PATCH] drm/amdkfd: Move gfx12 trap handler to separate file

2024-10-04 Thread Jay Cornwall
On 10/4/2024 04:42, Lancelot SIX wrote: + * Copyright 2024 Advanced Micro Devices, Inc. I am not really sure bout the year policy in the kernel, but all the content here is dated from before 2024.  The vast majority is taken form the cwsr_trap_handler_gfx10.asm file (copyright started in 201

[PATCH] drm/amdkfd: Move gfx12 trap handler to separate file

2024-10-03 Thread Jay Cornwall
gfx12 derivatives will have substantially different trap handler implementations from gfx10/gfx11. Add a separate source file for gfx12+ and remove unneeded conditional code. No functional change. Signed-off-by: Jay Cornwall Cc: Lancelot Six Cc: Jonathan Kim --- .../amd/amdkfd

[PATCH] drm/amdkfd: Extend gfx12 trap handler fix to gfx10/11

2024-06-05 Thread Jay Cornwall
s register. Both of these fields can assert while the wavefront is running the trap handler. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 16 +--- .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 38 ++- 2 files changed, 38

Re: [PATCH v2] drm/amdkfd: Handle deallocated VPGRs in gfx11+ trap handler

2024-05-29 Thread Jay Cornwall
On 5/29/2024 16:07, Lancelot SIX wrote: On 29/05/2024 20:35, Jay Cornwall wrote: A wavefront may deallocate its VGPRs at the end of a program while waiting for memory transactions to complete. If it subsequently receives a context save exception it will be unable to save, since this requires

[PATCH v2] drm/amdkfd: Handle deallocated VPGRs in gfx11+ trap handler

2024-05-29 Thread Jay Cornwall
intermittent VM faults under context switching load. V2: Use S_ENDPGM instead of S_ENDPGM_SAVED for performance counters Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 695 +- .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 17 + 2 files

[PATCH] drm/amdkfd: Handle deallocated VPGRs in gfx10+ trap handler

2024-05-28 Thread Jay Cornwall
intermittent VM faults under context switching load. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 744 +- .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 13 + 2 files changed, 386 insertions(+), 371 deletions(-) diff --git a/drivers

Re: [PATCH 3/3] drm/amdkfd: gfx12 context save/restore trap handler fixes

2024-05-23 Thread Jay Cornwall
On 5/23/2024 13:37, Lancelot SIX wrote: @@ -622,8 +638,15 @@ L_SAVE_HWREG:   #if ASIC_FAMILY >= CHIP_GFX12   // Ensure no further changes to barrier or LDS state. +    // STATE_PRIV.BARRIER_COMPLETE may change up to this point.   s_barrier_signal    -2   s_barrier_wait    -2 + +    /

[PATCH 3/3] drm/amdkfd: gfx12 context save/restore trap handler fixes

2024-05-23 Thread Jay Cornwall
ONTEXT,HOST_TRAP} when restoring this register. Both of these fields can assert while the wavefront is running the trap handler. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 1191 + .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 55

[PATCH 2/3] drm/amdkfd: Replace deprecated gfx12 trap handler instructions

2024-05-23 Thread Jay Cornwall
Newer assemblers reject S_WAITCNT. All instances of S_WAITCNT can be replaced by S_WAITCNT 0 (< gfx12) or S_WAIT_IDLE (>= gfx12) since there is no concurrency of different memory instruction classes. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_han

[PATCH 1/3] drm/amdkfd: Sync trap handler binary with source

2024-05-23 Thread Jay Cornwall
Source and binary have become mismatched during branch activity. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 57 --- 1 file changed, 24 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

Re: [PATCH] drm/amdkfd: update buffer_{store,load}_* modifiers for gfx940

2024-04-29 Thread Jay Cornwall
et:256*2 + buffer_load_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256*3 s_waitcnt vmcnt(0) end base-commit: cf743996352e327f483dc7d66606c90276f57380 Reviewed-by: Jay Cornwall

Re: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

2024-03-01 Thread Jay Cornwall
On 3/1/2024 00:35, Kim, Jonathan wrote: > The range check should probably flag any exception prefixed as > EC_QUEUE_PACKET_* as valid defined in kfd_dbg_trap_exception_code: > https://github.com/torvalds/linux/blob/master/include/uapi/linux/kfd_ioctl.h#L857 > + Jay to confirm this is the correct

Re: [PATCH] drm/amdkfd: Use SQC when TCP would fail in gfx10.1 context save

2024-02-26 Thread Jay Cornwall
On 2/23/2024 16:08, Laurent Morichetti wrote: > Similarly to gfx9, gfx10.1 drops vector stores when an xnack error is > raised. To work around this issue, use scalar stores instead of vector > stores when trapsts.xnack_error == 1. > > Signed-off-by: Laurent Morichetti Reviewed-by: Jay Cornwall

Re: [PATCH] amdkfd: fix the cwsr trap handler for gfx11

2024-01-31 Thread Jay Cornwall
s to second-level trap handler". Besides that: Reviewed-by: Jay Cornwall

Re: [PATCH] drm/amdkfd: Use S_ENDPGM_SAVED in trap handler

2024-01-24 Thread Jay Cornwall
On 1/15/2024 13:07, Jay Cornwall wrote: > This instruction has no functional difference to S_ENDPGM > but allows performance counters to track save events correctly. > > Signed-off-by: Jay Cornwall > --- > drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 14 +++---

[PATCH] drm/amdkfd: Use S_ENDPGM_SAVED in trap handler

2024-01-15 Thread Jay Cornwall
This instruction has no functional difference to S_ENDPGM but allows performance counters to track save events correctly. Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h | 14 +++--- .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm | 2 +- .../gpu

Re: [PATCH] drm/amdkfd: Fix the shift-out-of-bounds warning

2024-01-11 Thread Jay Cornwall
On 1/11/2024 11:25, Kim, Jonathan wrote: >> This looks OK. The compiler must be warning about a potential problem >> here, not a definite one. >> >> Question for Jon, how does the firmware encode the error code in the >> context ID? I see these macros: >> >> #define KFD_DEBUG_CP_BAD_OP_ECODE_MASK

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Jay Cornwall
in restore_process_bos > >     Signed-off-by: Felix Kuehling >     Acked-by: Christian König >     Tested-by: Emily Deng >     Signed-off-by: Alex Deucher > > > FWIW, I built a plain 6.6 kernel, and was not able to reproduce the > crash with some simple tests. > > Regards, >   Felix > > >> >> So I agree, let's revert it. >> >> Reviewed-by: Jay Cornwall

Re: [PATCH] Revert "drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole"

2024-01-03 Thread Jay Cornwall
} >> + >> +/* dGPUs: the reserved space for kernel >> + * before SVM >> + */ >> +pdd->qpd.cwsr_base = SVM_CWSR_BASE; >> +pdd->qpd.ib_base = SVM_IB_BASE; >> } >> >> dev_dbg(kfd_device, "node id %u\n", id); >> -- >> 2.42.0 >> I saw a segfault issue in Mesa yesterday. Not sure about the others, but I don't know how to make this change while compatibility with older UMDs. So I agree, let's revert it. Reviewed-by: Jay Cornwall

Re: [PATCH] drm/amdkfd: Clear the VALU exception state in the trap handler

2023-11-08 Thread Jay Cornwall
On 11/8/2023 18:23, Laurent Morichetti wrote: > The trap handler could be entered with pending VALU exceptions, so > clear the exception state before issuing vector instructions. > > Signed-off-by: Laurent Morichetti Reviewed-by: Jay Cornwall

[PATCH] drm/amdgpu: Improve MES responsiveness during oversubscription

2023-10-04 Thread Jay Cornwall
When MES is oversubscribed it may not frequently check for new command submissions from driver if the scheduling load is high. Response latency as high as 5 seconds has been observed. Enable a flag which adds a check for new commands between scheduling quantums. Signed-off-by: Jay Cornwall Cc

[PATCH] drm/amdkfd: Add missing tba_hi programming on aldebaran

2023-08-09 Thread Jay Cornwall
Previously asymptomatic because high 32 bits were zero. Fixes: 615222cfed20 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole") Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/g

[PATCH 2/3] drm/amdkfd: Sign-extend TMA address in trap handler

2023-07-31 Thread Jay Cornwall
SMEM instructions can reach addresses above 47 bits but require bit 47 to be sign-extended through bits [63:48]. This allows the TMA to be relocated in a following patch. Signed-off-by: Jay Cornwall --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 58 --- .../amd/amdkfd

[PATCH 3/3] drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole

2023-07-31 Thread Jay Cornwall
NULL access with a small offset. Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 30 ++-- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c index

[PATCH 1/3] drm/amdkfd: Sync trap handler binaries with source

2023-07-31 Thread Jay Cornwall
Some changes have been lost during rebases. Rebuild sources. Signed-off-by: Jay Cornwall --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 741 +- 1 file changed, 371 insertions(+), 370 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu

Re: [PATCH 1/1] drm/amdgpu: Read clock counter via MMIO to reduce delay (v4)

2021-06-30 Thread Jay Cornwall
On Wed, Jun 30, 2021, at 05:10, YuBiao Wang wrote: > [Why] > GPU timing counters are read via KIQ under sriov, which will introduce > a delay. > > [How] > It could be directly read by MMIO. > > v2: Add additional check to prevent carryover issue. > v3: Only check for carryover for once to prevent

[PATCH] drm/amdkfd: Move set_trap_handler out of dqm->ops

2021-03-04 Thread Jay Cornwall
Trap handler is set per-process per-device and is unrelated to queue management. Move implementation closer to TMA setup code. Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 22

[PATCH] drm/amdkfd: Use same SQ prefetch setting as amdgpu

2020-10-19 Thread Jay Cornwall
0 causes instruction fetch stall at cache line boundary under some conditions on Navi10. A non-zero prefetch is the preferred default in any case. Fixes soft hang in Luxmark. Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v10.c | 5 +++-- 1 file changed, 3

[PATCH 4/4] drm/amdkfd: Save TTMPs on all ASICs in gfx10 trap handler

2020-10-01 Thread Jay Cornwall
Trap temporary GPRs are not currently saved/restored on ASICs without scalar store instructions. They contain data useful to a user-mode debugger. Use vector store instructons to save TTMPs on these ASICs. Signed-off-by: Jay Cornwall Cc: Laurent Morichetti --- .../gpu/drm/amd/amdkfd

[PATCH 2/4] drm/amdkfd: Remove duplicated code from trap handler

2020-10-01 Thread Jay Cornwall
IB_STS bits are saved/restored in both PC and ttmp11 along different code paths. Use ttmp11 on both paths to remove redundant code. Signed-off-by: Jay Cornwall Cc: Laurent Morichetti --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 764 +- .../amd/amdkfd

[PATCH 1/4] drm/amdkfd: Remove legacy code from trap handler

2020-10-01 Thread Jay Cornwall
ATC and MTYPE fields do not exist in gfx9 or later. Signed-off-by: Jay Cornwall Cc: Laurent Morichetti --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 93 ++- .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 28 +- .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 30

[PATCH 3/4] drm/amdkfd: Move first_wave bit in gfx10 trap handler

2020-10-01 Thread Jay Cornwall
Save first_wave bit from exec_hi to ttmp1. This allows the high bits of exec_lo/exec_hi (which hold a 48-bit address) to be cleared in a follow-up patch. Signed-off-by: Jay Cornwall Cc: Laurent Morichetti --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 596 +- .../amd

[PATCH] drm/amdgpu: Update Arcturus golden registers

2019-11-20 Thread Jay Cornwall
Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 8073fcd..9f90448 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm

Re: [PATCH 2/2] drm/amdkfd: Extend CU mask to 8 SEs (v2)

2019-08-01 Thread Jay Cornwall
On Thu, Aug 1, 2019, at 13:47, Alex Deucher wrote: > From: Jay Cornwall > > Following bitmap layout logic introduced by: > "drm/amdgpu: support get_cu_info for Arcturus". > > v2: squash in fixup for gfx_v9_0.c (Alex) There's a second patch to squash, whic

Re: [PATCH 02/12] drm/amdgpu: send IVs to the KFD only after processing them

2018-09-26 Thread Jay Cornwall
On Wed, Sep 26, 2018, at 08:53, Christian König wrote: > This allows us to filter out VM faults in the GMC code. > > Signed-off-by: Christian König The KFD needs to receive notification of unhandled VM faults; when demand paging is disabled or the address is not pageable. It propagates this to

Re: KFD event handling questions

2017-10-03 Thread Jay Cornwall
On Mon, Oct 2, 2017, at 08:22, Kuehling, Felix wrote: > Is the "new debug trap handler" already working? It seems right now I'm > breaking the "old" debugger backend test. However, given the current > status of that debugger, I guess we can disable those tests for now? > > Can you speak on behalf

Re: [PATCH] drm/amdgpu: set sched_hw_submission higher for KIQ

2017-08-22 Thread Jay Cornwall
On Tue, Aug 22, 2017, at 16:17, Felix Kuehling wrote: > Thanks Alex! > > Jay, do you think this is enough? This bumps the number of concurrent > operations on KIQ to 4 by default. I'm not sure what the best number is. Up to 8 KFD processes is common (beyond that performance drops off due to VMID

[PATCH 3/4] drm/radeon: Remove initialization of shared_resources.num_mec

2017-07-13 Thread Jay Cornwall
Dead code. Change-Id: I2383e0b541ed55288570b6a0ec8a0d49cdd4df89 Signed-off-by: Jay Cornwall --- drivers/gpu/drm/radeon/radeon_kfd.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c index 719ea51..8f8c7c1 100644 --- a

[PATCH v3 1/4] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-13 Thread Jay Cornwall
oversubscribed runlist. v2: Remove unused num_mec field to avoid duplicate logic v3: Separate num_mec removal into separate patches Change-Id: I9e7bba2cc1928b624e3eeb1edb06fdb602e5294f Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH 2/4] drm/amdkfd: Remove unused references to shared_resources.num_mec

2017-07-13 Thread Jay Cornwall
Dead code. Change-Id: Ic0bb1bcca87e96bc5e8fa9894727b0de152e8818 Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 7 --- 2 files changed, 11 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH 4/4] drm/amdgpu: Remove unused field kgd2kfd_shared_resources.num_mec

2017-07-13 Thread Jay Cornwall
Dead code. Change-Id: I9575aa73b5741b80dc340f953cc773385c92b2be Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 - drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 3 --- 2 files changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v2] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-13 Thread Jay Cornwall
oversubscribed runlist. v2: Remove unused num_mec field to avoid duplicate logic Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461 Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 3 +-- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 drivers/gpu

Re: [PATCH] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-13 Thread Jay Cornwall
On Thu, Jul 13, 2017, at 13:36, Andres Rodriguez wrote: > On 2017-07-12 02:26 PM, Jay Cornwall wrote: > > The number of compute queues available to the KFD was erroneously > > calculated as 64. Only the first MEC can execute compute queues and > > it has 32 queue slots. >

[PATCH] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

2017-07-12 Thread Jay Cornwall
oversubscribed runlist. Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461 Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu

Re: [PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault types

2017-07-12 Thread Jay Cornwall
On Wed, Jul 12, 2017, at 12:37, Felix Kuehling wrote: > On 17-07-12 11:59 AM, Alex Deucher wrote: > > On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling > > wrote: > >> Any comments? > >> > >> I believe this is a nice stability improvement. In case of VM faults > >> they don't take down the whole GP

Re: [PATCH] drm/amdgpu: Added more hqd debug messages

2017-03-02 Thread Jay Cornwall
On Wed, Mar 1, 2017, at 16:28, Zeng, Oak wrote: > COMPUTE_PGM* registers are per pipe per queue - each queue of each pipe > has a copy of those registers. COMPUTE_* are ADC registers. These are instantiated once per pipe. The values they hold corresponds to the most recent values written from the

Re: [PATCH] drm/amdgpu/gfx7: move eop programming per queue

2016-11-23 Thread Jay Cornwall
On Wed, Nov 23, 2016, at 14:27, Alex Deucher wrote: > It's per queue not per pipe. Are you sure? I was under the impression that EOP queeus were per-pipe on Gfx7 and per-queue on Gfx8 onwards (to support context save/restore). It's also hinted at by the register name (HPD == Hardware Pipe Descript

Re: [PATCH] drm/amdgpu: Fix memory trashing if UVD ring test fails

2016-08-10 Thread Jay Cornwall
On 2016-08-10 11:10, Alex Deucher wrote: On Wed, Aug 3, 2016 at 2:39 PM, Jay Cornwall wrote: fence_put was called on an uninitialized variable. Signed-off-by: Jay Cornwall Can you commit this internally or do you need one of us to? Alex I'm less familiar with the amdgpu branches

Re: [PATCH 6/6] drm/amdgpu: use more than 64KB fragment size if possible

2016-08-09 Thread Jay Cornwall
On 2016-08-09 11:35, Christian König wrote: Am 09.08.2016 um 17:49 schrieb Jay Cornwall: On 2016-08-09 07:52, Christian König wrote: From: Christian König We align to 64KB, but when userspace aligns even more we can easily use more. Signed-off-by: Christian König --- drivers/gpu/drm/amd

Re: [PATCH 6/6] drm/amdgpu: use more than 64KB fragment size if possible

2016-08-09 Thread Jay Cornwall
ag_end != end) { Would this change not direct larger fragments away from the BigK TLB partition? My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an exact match and not a minimum size. I can't find any immediate documentation on that t

[PATCH] drm/amdgpu: Fix memory trashing if UVD ring test fails

2016-08-03 Thread Jay Cornwall
fence_put was called on an uninitialized variable. Signed-off-by: Jay Cornwall --- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c index b11f4e8