RE: [PATCH] drm/amdgpu: Add debug mask to disable CE logs

2025-06-09 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Liu, Xiang(Dean) Sent: Friday, June 6, 2025 12:11 To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Liu, Xiang(Dean) Subject: [PATCH] drm/amdgpu: Add debug

Re: [PATCH v2] drm/amd/display: Fix exception handling in dm_validate_stream_and_context()

2025-06-09 Thread kernel test robot
its/Markus-Elfring/drm-amd-display-Fix-exception-handling-in-dm_validate_stream_and_context/20250609-151039 base: next-20250606 patch link: https://lore.kernel.org/r/da489521-7786-4716-8fb8-d79b3c08d93c%40web.de patch subject: [PATCH v2] drm/amd/display: Fix exception h

RE: [PATCH] drm/amdgpu/mes: add compatibility checks for set_hw_resource_1

2025-06-09 Thread Liu, Shaoyun
[AMD Official Use Only - AMD Internal Distribution Only] Maybe it's better to use the MES FW version . Here is what I got from the MES release. For nv3x : 0x50 For nv4x : 0x4b Regards Shaoyun.liu -Original Message- From: Deucher, Alexander Sent: Monday, June 9, 2025 11:34 AM To: amd

[PATCH] drm/amdgpu/mes: add compatibility checks for set_hw_resource_1

2025-06-09 Thread Alex Deucher
Seems some older MES firmware versions do not properly support this packet. Add back some the compatibility checks. Fixes: f81cd793119e ("drm/amd/amdgpu: Fix MES init sequence") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4295 Cc: Shaoyun Liu Signed-off-by: Alex Deucher --- drivers

Re: [PATCH 29/29] drm/amdgpu/vcn5: re-emit unprocessed state on ring reset

2025-06-09 Thread Sundararaju, Sathishkumar
This patch series is :- Reviewed-by: Sathishkumar S One nit-pick, amdgpu_ring_backup_unprocessed_commands function could use amdgpu_fence instead of dma_fence as argument. And JPEG/VCN changes in this series are also :- Tested-by: Sathishkumar S Note: JPEG5 and JPEG4_0_5 reset fails due to

[PATCH AUTOSEL 6.1 02/16] amd/amdkfd: fix a kfd_process ref leak

2025-06-09 Thread Sasha Levin
From: Yifan Zhang [ Upstream commit 90237b16ec1d7afa16e2173cc9a664377214cdd9 ] This patch is to fix a kfd_prcess ref leak. Signed-off-by: Yifan Zhang Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- **YES** This commit should be backported to stable kern

[PATCH AUTOSEL 6.6 02/18] amd/amdkfd: fix a kfd_process ref leak

2025-06-09 Thread Sasha Levin
From: Yifan Zhang [ Upstream commit 90237b16ec1d7afa16e2173cc9a664377214cdd9 ] This patch is to fix a kfd_prcess ref leak. Signed-off-by: Yifan Zhang Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- **YES** This commit should be backported to stable kern

[PATCH AUTOSEL 6.12 04/23] amd/amdkfd: fix a kfd_process ref leak

2025-06-09 Thread Sasha Levin
From: Yifan Zhang [ Upstream commit 90237b16ec1d7afa16e2173cc9a664377214cdd9 ] This patch is to fix a kfd_prcess ref leak. Signed-off-by: Yifan Zhang Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- **YES** This commit should be backported to stable kern

[PATCH AUTOSEL 6.14 04/29] amd/amdkfd: fix a kfd_process ref leak

2025-06-09 Thread Sasha Levin
From: Yifan Zhang [ Upstream commit 90237b16ec1d7afa16e2173cc9a664377214cdd9 ] This patch is to fix a kfd_prcess ref leak. Signed-off-by: Yifan Zhang Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- **YES** This commit should be backported to stable kern

[PATCH AUTOSEL 6.15 05/35] amd/amdkfd: fix a kfd_process ref leak

2025-06-09 Thread Sasha Levin
From: Yifan Zhang [ Upstream commit 90237b16ec1d7afa16e2173cc9a664377214cdd9 ] This patch is to fix a kfd_prcess ref leak. Signed-off-by: Yifan Zhang Reviewed-by: Philip Yang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- **YES** This commit should be backported to stable kern

Re: [PATCH 06/29] drm/amdgpu: update ring reset function signature

2025-06-09 Thread Alex Deucher
On Mon, Jun 9, 2025 at 8:43 AM Sundararaju, Sathishkumar wrote: > > > > On 6/6/2025 9:30 PM, Alex Deucher wrote: > > On Fri, Jun 6, 2025 at 7:41 AM Christian König > > wrote: > >> On 6/6/25 08:43, Alex Deucher wrote: > >>> Going forward, we'll need more than just the vmid. Everything > >>> we n

RE: [PATCH 00/23] DC Patches June 04, 2025

2025-06-09 Thread Wheeler, Daniel
[Public] Hi all, This week this patchset was tested on 4 systems, two dGPU and two APU based, and tested across multiple display and connection types. APU * Single Display eDP -> 1080p 60hz, 1920x1200 165hz * Single Display DP (SST DSC) -> 4k144hz, 4k240hz * Multi displa

Re: [REGRESSION] amdgpu fails to load external RX 580 since PCI: Allow relaxed bridge window tail sizing for optional resources

2025-06-09 Thread Ilpo Järvinen
On Mon, 9 Jun 2025, r...@r26.me wrote: > Hello, > > I have an external Radeon RX580 on my machine connected via Thunderbolt, and > since upgrading from 6.14.1 the setup stopped working. Dmesg showed warning > from > resource sanity check, followed by a stack trace > https://pastebin.com/njR55rQ

[PATCH] drm/amdkfd: register HMM dev memory to DMA-able range first

2025-06-09 Thread francisco_flynn
HMM device memory is allocated at the top of iomem_resource, when iomem_resource is larger than GPU device's dma mask, after devm_memremap_pages, max_pfn will also be update and exceed device's dma mask, when there are multiple card on system need to be init, ttm_device_init would be called with us

Re: [PATCH] drm/amdkfd: register HMM dev memory to DMA-able range first

2025-06-09 Thread Felix Kuehling
On 2025-06-09 5:36, francisco_flynn wrote: > HMM device memory is allocated at the top of > iomem_resource, when iomem_resource is larger than > GPU device's dma mask, after devm_memremap_pages, > max_pfn will also be update and exceed device's > dma mask, when there are multiple card on system >

Re: [PATCH 06/29] drm/amdgpu: update ring reset function signature

2025-06-09 Thread Sundararaju, Sathishkumar
On 6/6/2025 9:30 PM, Alex Deucher wrote: On Fri, Jun 6, 2025 at 7:41 AM Christian König wrote: On 6/6/25 08:43, Alex Deucher wrote: Going forward, we'll need more than just the vmid. Everything we need in currently in the amdgpu job structure, so just pass that in. Please don't the job is

RE: [PATCH] drm/amdgpu: Suspend IH during mode-2 reset

2025-06-09 Thread Kamal, Asad
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Asad Kamal Thanks & Regards Asad -Original Message- From: Lazar, Lijo Sent: Monday, June 9, 2025 10:11 AM To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander ; Kamal, Asad Subject: [PATCH] drm

[PATCH v3 7/7] drm/syncobj: Add a fast path to drm_syncobj_array_find

2025-06-09 Thread Tvrtko Ursulin
Running the Cyberpunk 2077 benchmark we can observe that the lookup helper is relatively hot, but the 97% of the calls are for a single object. (~3% for two points, and never more than three points. While a more trivial workload like vkmark under Plasma is even more skewed to single point lookups.)

[PATCH v3 6/7] drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout

2025-06-09 Thread Tvrtko Ursulin
Running the Cyberpunk 2077 benchmark we can observe that waiting on DRM sycobjs is relatively hot, but the 96% of the calls are for a single object. (~4% for two points, and never more than three points. While a more trivial workload like vkmark under Plasma is even more skewed to single point wait

[PATCH v3 5/7] drm/syncobj: Avoid temporary allocation in drm_syncobj_timeline_signal_ioctl

2025-06-09 Thread Tvrtko Ursulin
We can avoid one of the two temporary allocations if we read the userspace supplied timeline points as we go along. The only new complication is to unwind unused fence chains on the error path, but even that code was already present in the function. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maí

[PATCH v3 3/7] drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find

2025-06-09 Thread Tvrtko Ursulin
Drm_syncobj_array_find() helper is used from many userspace ioctl entry points with the task of looking up userspace handles to internal objects. We can easily avoid one temporary allocation by making it read the handles as it is looking them up. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra

[PATCH v3 4/7] drm/syncobj: Use put_user in drm_syncobj_query_ioctl

2025-06-09 Thread Tvrtko Ursulin
Since the query loop is using copy_to_user() to write out a single u64 at a time it feels more natural (and is a tiny bit more compact) to replace it with put_user(). Access_ok() check is added to the input checking for an early bailout in case of a bad buffer passed in. Signed-off-by: Tvrtko Urs

[PATCH v3 1/7] drm/syncobj: Remove unhelpful helper

2025-06-09 Thread Tvrtko Ursulin
Helper which fails to consolidate the code and instead just forks into two copies of the code based on a boolean parameter is not very helpful or readable. Lets just remove it and proof in the pudding is the net smaller code. Signed-off-by: Tvrtko Ursulin Reviewed-by: Maíra Canal --- v2: * Assi

[PATCH v3 2/7] drm/syncobj: Do not allocate an array to store zeros when waiting

2025-06-09 Thread Tvrtko Ursulin
When waiting on syncobjs the current code allocates a temporary array only to fill it up with all zeros. We can avoid that by relying on the allocated entry array already being zero allocated. For the timeline mode we can fetch the timeline point values as we populate the entries array so also do

[PATCH v3.1 0/7] A few drm_syncobj optimisations

2025-06-09 Thread Tvrtko Ursulin
[All reviewed, sending for more acks.] A small set of drm_syncobj optimisations which should make things a tiny bit more efficient on the CPU side of things. Improvement seems to be around 1.5%* more FPS if observed with "vkgears -present-mailbox" on a Steam Deck Plasma desktop, but I am reluctan

[PATCH v5 3/4] drm/i915: Protect access to driver and timeline name

2025-06-09 Thread Tvrtko Ursulin
Protect the access to driver and timeline name which otherwise could be freed as dma-fence exported is signalling fences. Now that the safe access is handled in the dma-fence API, the external callers such as sync_file, and our internal code paths, we can drop the similar protection from i915_fenc

[PATCH v5 0/4] Fixing some dma-fence use-after-free

2025-06-09 Thread Tvrtko Ursulin
Hi all, tl;dr; Xe and probably some other drivers can tear down the internal state referenced by an exported sync_file fence which then causes a null pointer derefences on accessing said fence. IGT that exploits the problem: https://patchwork.freedesktop.org/patch/642709/?series=146211&rev=2 It

[PATCH v5 1/4] dma-fence: Add safe access helpers and document the rules

2025-06-09 Thread Tvrtko Ursulin
Dma-fence objects currently suffer from a potential use after free problem where fences exported to userspace and other drivers can outlive the exporting driver, or the associated data structures. The discussion on how to address this concluded that adding reference counting to all the involved ob

[PATCH v5 4/4] drm/xe: Make dma-fences compliant with the safe access rules

2025-06-09 Thread Tvrtko Ursulin
Xe can free some of the data pointed to by the dma-fences it exports. Most notably the timeline name can get freed if userspace closes the associated submit queue. At the same time the fence could have been exported to a third party (for example a sync_fence fd) which will then cause an use- after-

[PATCH v5 2/4] sync_file: Protect access to driver and timeline name

2025-06-09 Thread Tvrtko Ursulin
Protect the access to driver and timeline name which otherwise could be freed as dma-fence exported is signalling fences. Signed-off-by: Tvrtko Ursulin --- drivers/dma-buf/sync_file.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_fil

[REGRESSION] amdgpu fails to load external RX 580 since PCI: Allow relaxed bridge window tail sizing for optional resources

2025-06-09 Thread rio
Hello, I have an external Radeon RX580 on my machine connected via Thunderbolt, and since upgrading from 6.14.1 the setup stopped working. Dmesg showed warning from resource sanity check, followed by a stack trace https://pastebin.com/njR55rQW. Relevant snippet: [ 12.134907] amdgpu :06:00.0

[PATCH v2] drm/amd/display: Fix exception handling in dm_validate_stream_and_context()

2025-06-09 Thread Markus Elfring
From: Markus Elfring Date: Mon, 9 Jun 2025 08:21:16 +0200 The label “cleanup” was used to jump to another pointer check despite of the detail in the implementation of the function “dm_validate_stream_and_context” that it was determined already that corresponding variables contained still null po

Re: [PATCH v3 2/5] PCI: Put PCIe ports with downstream devices into D3 at hibernate

2025-06-09 Thread kernel test robot
drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Mario-Limonciello/PM-Use-hibernate-flows-for-system-power-off/20250609-105658 bas