Re: [PATCH] drm/amdkfd: Fix compile error if HMM support not enabled

2024-07-26 Thread Felix Kuehling
On 2024-07-25 19:25, Philip Yang wrote: Fixes the below if kernel config not enable HMM support drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:26: error: implicit declaration of function 'svm_range_from_addr' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:24: error: assignment

Re: [PATCH 1/2] drm/amdkfd: support per-queue reset on gfx9

2024-07-26 Thread Alex Deucher
On Fri, Jul 26, 2024 at 11:31 AM Jonathan Kim wrote: > > Support per-queue reset for GFX9. The recommendation is for the driver > to target reset the HW queue via a SPI MMIO register write. > > Since this requires pipe and HW queue info and MEC FW is limited to > doorbell reports of hung queues a

RE: [PATCH 2/2] drm/amdkfd: support the debugger during per-queue reset

2024-07-26 Thread Kim, Jonathan
[Public] > -Original Message- > From: Alex Deucher > Sent: Friday, July 26, 2024 2:57 PM > To: Kim, Jonathan > Cc: amd-gfx@lists.freedesktop.org; Kuehling, Felix > ; Deucher, Alexander > > Subject: Re: [PATCH 2/2] drm/amdkfd: support the debugger during per- > queue reset > > Caution: T

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Khatri, Sunil
On 7/27/2024 12:13 AM, Alex Deucher wrote: On Fri, Jul 26, 2024 at 1:16 PM Khatri, Sunil wrote: On 7/26/2024 8:36 PM, Lazar, Lijo wrote: On 7/26/2024 8:11 PM, Khatri, Sunil wrote: On 7/26/2024 7:53 PM, Khatri, Sunil wrote: On 7/26/2024 7:18 PM, Lazar, Lijo wrote: On 7/26/2024 6:42 PM, Al

Re: [PATCH 2/2] drm/amdkfd: support the debugger during per-queue reset

2024-07-26 Thread Alex Deucher
On Fri, Jul 26, 2024 at 11:40 AM Jonathan Kim wrote: > > In order to allow ROCm GDB to handle reset queues, raise an > EC_QUEUE_RESET exception so that the debugger can subscribe and > query this exception. > > Reset queues should still be considered suspendable with a status > flag of KFD_DBG_QUE

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Alex Deucher
On Fri, Jul 26, 2024 at 1:16 PM Khatri, Sunil wrote: > > > On 7/26/2024 8:36 PM, Lazar, Lijo wrote: > > > > On 7/26/2024 8:11 PM, Khatri, Sunil wrote: > >> On 7/26/2024 7:53 PM, Khatri, Sunil wrote: > >>> On 7/26/2024 7:18 PM, Lazar, Lijo wrote: > On 7/26/2024 6:42 PM, Alex Deucher wrote: > >

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Khatri, Sunil
On 7/26/2024 8:36 PM, Lazar, Lijo wrote: On 7/26/2024 8:11 PM, Khatri, Sunil wrote: On 7/26/2024 7:53 PM, Khatri, Sunil wrote: On 7/26/2024 7:18 PM, Lazar, Lijo wrote: On 7/26/2024 6:42 PM, Alex Deucher wrote: On Fri, Jul 26, 2024 at 8:48 AM Sunil Khatri wrote: Problem: IP dump right now

[PATCH v4] drm/amdkfd: Change kfd/svm page fault drain handling

2024-07-26 Thread Xiaogang . Chen
From: Xiaogang Chen When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page fault" can be long and is unpredicable. That is ad

[PATCH 2/2] drm/amdkfd: support the debugger during per-queue reset

2024-07-26 Thread Jonathan Kim
In order to allow ROCm GDB to handle reset queues, raise an EC_QUEUE_RESET exception so that the debugger can subscribe and query this exception. Reset queues should still be considered suspendable with a status flag of KFD_DBG_QUEUE_RESET_MASK. However they should not be resumable since user spac

[PATCH 1/2] drm/amdkfd: support per-queue reset on gfx9

2024-07-26 Thread Jonathan Kim
Support per-queue reset for GFX9. The recommendation is for the driver to target reset the HW queue via a SPI MMIO register write. Since this requires pipe and HW queue info and MEC FW is limited to doorbell reports of hung queues after an unmap failure, scan the HW queue slots defined by SET_RES

Re: [PATCH] drm/amdgpu: always allocate cleared VRAM for GEM allocations

2024-07-26 Thread Alex Deucher
On Fri, Jul 26, 2024 at 9:50 AM Alex Deucher wrote: > > This adds allocation latency, but aligns better with user > expectations. The latency should improve with the drm buddy > clearing patches that Arun has been working on. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/am

RE: [PATCH 00/39] DC Patches July 25, 2024

2024-07-26 Thread Wheeler, Daniel
[Public] Hi all, This week this patchset was tested on the following systems: * Lenovo ThinkBook T13s Gen4 with AMD Ryzen 5 6600U * MSI Gaming X Trio RX 6800 * Gigabyte Gaming OC RX 7900 XTX These systems were tested on the following display/connection types: * eD

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Lazar, Lijo
On 7/26/2024 8:11 PM, Khatri, Sunil wrote: > > On 7/26/2024 7:53 PM, Khatri, Sunil wrote: >> >> On 7/26/2024 7:18 PM, Lazar, Lijo wrote: >>> >>> On 7/26/2024 6:42 PM, Alex Deucher wrote: On Fri, Jul 26, 2024 at 8:48 AM Sunil Khatri wrote: > Problem: > IP dump right now is don

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Khatri, Sunil
On 7/26/2024 7:53 PM, Khatri, Sunil wrote: On 7/26/2024 7:18 PM, Lazar, Lijo wrote: On 7/26/2024 6:42 PM, Alex Deucher wrote: On Fri, Jul 26, 2024 at 8:48 AM Sunil Khatri wrote: Problem: IP dump right now is done post suspend of all IP's which for some IP's could change power state and sof

Re: [PATCH v2 2/2] drm/radeon: convert bios_hardcoded_edid to drm_edid

2024-07-26 Thread Alex Deucher
Applied the series. Thanks! Alex On Fri, Jul 26, 2024 at 9:40 AM Thomas Weißschuh wrote: > > Instead of manually passing around 'struct edid *' and its size, > use 'struct drm_edid', which encapsulates a validated combination of > both. > > As the drm_edid_ can handle NULL gracefully, the expli

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Khatri, Sunil
On 7/26/2024 7:18 PM, Lazar, Lijo wrote: On 7/26/2024 6:42 PM, Alex Deucher wrote: On Fri, Jul 26, 2024 at 8:48 AM Sunil Khatri wrote: Problem: IP dump right now is done post suspend of all IP's which for some IP's could change power state and software state too which we do not want to refl

Re: [PATCH] drm/sched: add optional errno to drm_sched_start()

2024-07-26 Thread Daniel Vetter
On Fri, Jul 26, 2024 at 09:55:50AM +0200, Christian König wrote: > The current implementation of drm_sched_start uses a hardcoded > -ECANCELED to dispose of a job when the parent/hw fence is NULL. > This results in drm_sched_job_done being called with -ECANCELED for > each job with a NULL parent in

Re: [PATCH -next] drm/amd/display: Use ARRAY_SIZE for array length

2024-07-26 Thread Alex Deucher
Applied. Thanks! On Fri, Jul 26, 2024 at 5:55 AM Jiapeng Chong wrote: > > Use of macro ARRAY_SIZE to calculate array size minimizes > the redundant code and improves code reusability. > > ./drivers/gpu/drm/amd/display/dc/spl/dc_spl_scl_easf_filters.c:1552:57-58: > WARNING: Use ARRAY_SIZE. > ./d

[PATCH] drm/amd/display: Handle null 'stream_status' in 'planes_changed_for_existing_stream'

2024-07-26 Thread Srinivasan Shanmugam
This commit adds a null check for 'stream_status' in the function 'planes_changed_for_existing_stream'. Previously, the code assumed 'stream_status' could be null, but did not handle the case where it was actually null. This could lead to a null pointer dereference. Reported by smatch: drivers/gpu

Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function

2024-07-26 Thread Paneer Selvam, Arunpravin
On 7/24/2024 8:42 PM, Jani Nikula wrote: On Tue, 23 Jul 2024, Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Lazar, Lijo
On 7/26/2024 6:42 PM, Alex Deucher wrote: > On Fri, Jul 26, 2024 at 8:48 AM Sunil Khatri wrote: >> >> Problem: >> IP dump right now is done post suspend of >> all IP's which for some IP's could change power >> state and software state too which we do not want >> to reflect in the dump as it mig

[PATCH] drm/amdgpu: always allocate cleared VRAM for GEM allocations

2024-07-26 Thread Alex Deucher
This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 1 file changed, 4 insertions(+) diff --git a/d

Re: [PATCH] drm/amdkfd: Fix missing error code in kfd_queue_acquire_buffers

2024-07-26 Thread Philip Yang
The kfdtest user queue validation cases don't cover those error condition path, thanks for catching it. This patch is Reviewed-by: Philip Yang On 2024-07-26 02:47, Srinivasan Shanmugam wrote: The fix involves setting 'err' to '-EINVAL' befo

Re: [PATCH 1/2] drm/amdgpu: print VCN instance dump for valid instance

2024-07-26 Thread Alex Deucher
On Fri, Jul 26, 2024 at 8:48 AM Sunil Khatri wrote: > > VCN dump is dependent on power state of the ip. Dump is > valid if VCN was powered up at the time of ip dump. > > Signed-off-by: Sunil Khatri Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 28 +-

Re: [PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Alex Deucher
On Fri, Jul 26, 2024 at 8:48 AM Sunil Khatri wrote: > > Problem: > IP dump right now is done post suspend of > all IP's which for some IP's could change power > state and software state too which we do not want > to reflect in the dump as it might not be same at > the time of hang. > > Solution: >

Re: [PATCH] drm/amdgpu/pm: update documentation on memory clock

2024-07-26 Thread Alex Deucher
On Thu, Jul 25, 2024 at 11:11 PM Feng, Kenneth wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi Alex, > I know that G6 MCLK = 2*UCLK. > May I know how did you get the data that effective_memory_clock = > memory_controller_clock * 1? I swear someone on the pplib team tol

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-26 Thread Alex Deucher
On Fri, Jul 26, 2024 at 3:05 AM Christian König wrote: > > Am 25.07.24 um 20:09 schrieb Nikita Zhandarovich: > > Several cs track offsets (such as 'track->db_s_read_offset') > > either are initialized with or plainly take big enough values that, > > once shifted 8 bits left, may be hit with intege

[PATCH 2/2] drm/amdgpu: trigger ip dump before suspend of IP's

2024-07-26 Thread Sunil Khatri
Problem: IP dump right now is done post suspend of all IP's which for some IP's could change power state and software state too which we do not want to reflect in the dump as it might not be same at the time of hang. Solution: IP should be dumped as close to the HW state when the GPU was in hung s

[PATCH 1/2] drm/amdgpu: print VCN instance dump for valid instance

2024-07-26 Thread Sunil Khatri
VCN dump is dependent on power state of the ip. Dump is valid if VCN was powered up at the time of ip dump. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 28 +-- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/

Re: [PATCH] drm/sched: add optional errno to drm_sched_start()

2024-07-26 Thread Matthew Brost
On Fri, Jul 26, 2024 at 09:55:50AM +0200, Christian König wrote: > The current implementation of drm_sched_start uses a hardcoded > -ECANCELED to dispose of a job when the parent/hw fence is NULL. > This results in drm_sched_job_done being called with -ECANCELED for > each job with a NULL parent in

[PATCH] drm/sched: add optional errno to drm_sched_start()

2024-07-26 Thread Christian König
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery me

[PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-26 Thread Nikita Zhandarovich
Several cs track offsets (such as 'track->db_s_read_offset') either are initialized with or plainly take big enough values that, once shifted 8 bits left, may be hit with integer overflow if the resulting values end up going over u32 limit. Some debug prints take this into account (see according d

[bug report] drm/amd/display: DML2.1 resynchronization

2024-07-26 Thread Dan Carpenter
Hello Chaitanya Dhere, This is a semi-automatic email about new static checker warnings. Commit 2563391e57b5 ("drm/amd/display: DML2.1 resynchronization") from Jul 2, 2024, leads to the following Smatch complaint: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_d

[bug report] drm/amdkfd: Validate queue cwsr area and eop buffer size

2024-07-26 Thread Dan Carpenter
Hello Philip Yang, Commit 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size") from Jun 26, 2024 (linux-next), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:265 kfd_queue_acquire_buffers() warn: missing

Re: [PATCH] drm/amd/display: Add null check for pipe_ctx->plane_state in dcn20_program_pipe

2024-07-26 Thread Chung, ChiaHsuan (Tom)
Reviewed-by: Tom Chung On 7/25/2024 10:54 AM, Srinivasan Shanmugam wrote: This commit addresses a null pointer dereference issue in the `dcn20_program_pipe` function. The issue could occur when `pipe_ctx->plane_state` is null. The fix adds a check to ensure `pipe_ctx->plane_state` is not null

Re: [PATCH] drm/amd/display: Add null check for top_pipe_to_program in commit_planes_for_stream

2024-07-26 Thread Chung, ChiaHsuan (Tom)
Reviewed-by: Tom Chung On 7/25/2024 10:54 AM, Srinivasan Shanmugam wrote: This commit addresses a null pointer dereference issue in the `commit_planes_for_stream` function at line 4140. The issue could occur when `top_pipe_to_program` is null. The fix adds a check to ensure `top_pipe_to_progra

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-26 Thread Christian König
I strongly suggest to revert that again. See my other mail. Christian. Am 25.07.24 um 22:59 schrieb Alex Deucher: Applied. Thanks! Alex On Thu, Jul 25, 2024 at 2:20 PM Nikita Zhandarovich wrote: Several cs track offsets (such as 'track->db_s_read_offset') either are initialized with or pla

Re: [PATCH] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-07-26 Thread Christian König
Am 25.07.24 um 20:09 schrieb Nikita Zhandarovich: Several cs track offsets (such as 'track->db_s_read_offset') either are initialized with or plainly take big enough values that, once shifted 8 bits left, may be hit with integer overflow if the resulting values end up going over u32 limit. Some