[AMD Official Use Only]
Thanks for the confirming. I probably know the root cause.
Let me prepare an official patch for you.
BR
Evan
> -Original Message-
> From: Arthur Marsh
> Sent: Friday, April 1, 2022 8:19 PM
> To: Quan, Evan
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Feng, Ke
Use rcu_read_lock to read p->event_idr concurrently with other readers
and writers. Use p->event_mutex only for creating and destroying events
and in kfd_wait_on_events.
Protect the contents of the kfd_event structure with a per-event
spinlock that can be taken inside the rcu_read_lock critical se
On 2022-03-31 04:53, Christoph Hellwig wrote:
- page = vm_normal_page(vma, addr, pte);
+ page = vm_normal_lru_page(vma, addr, pte);
Why can't this deal with ZONE_DEVICE pages? It certainly has
nothing do with a LRU I think. In fact being able to have
stats that count say the numb
For VG20 + XGMI bridge, all mappings PTEs cache in TC, this may have
stall invalid PTEs in TC because one cache line has 8 pages. Need always
flush_tlb after updating mapping.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++
1 file changed, 6 insertions(+)
diff
On 2022-04-01 04:24, Christian König wrote:
Am 31.03.22 um 16:37 schrieb Felix Kuehling:
Am 2022-03-31 um 02:27 schrieb Christian König:
Am 30.03.22 um 22:51 schrieb philip yang:
On 2022-03-30 05:00, Christian König wrote:
Testing the valid bit is not enough to figure out if we
need to in
tree/branch:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: e5071887cd2296a7704dbcd10c1cedf0f11cdbd5 Add linux-next specific
files for 20220401
Error/Warning reports:
https://lore.kernel.org/linux-media/202203171537.svhye362-...@intel.com
https
[AMD Official Use Only]
Hi Paul and Harry,
Thanks for reviewing the patch and commit msg has been revised as per your
comments in the v2.
From: Paul Menzel
Sent: Friday, April 1, 2022 1:46 AM
To: Zhang, Dingchen (David)
Cc: amd-gfx@lists.freedesktop.org ;
dri-de...@lists.freedesktop.org ;
dear Alex Deucher
I just mean where unneeded semicolon comes from when I add fixes info. As your
remind, I have got it, thank you.
原始邮件
发件人:Alex Deucher
时间:2022年4月1日 21:26
收件人:Paul Menzel
抄送:白浩文 ,David Airlie ,"Pan, Xinhui"
,LKML ,Maling list - DRI
developers ,amd-gfx
Applied. Thanks!
Alex
On Fri, Apr 1, 2022 at 3:23 AM Haowen Bai wrote:
>
> report by coccicheck:
> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:1951:2-3: Unneeded semicolon
>
> Fixes: c543dcbe4237 ("drm/amdgpu/vcn: Add VCN ras error query support")
>
> Signed-off-by: Haowen Bai
> ---
> V1->V2: change
Applied. Thanks!
Alex
On Thu, Mar 24, 2022 at 9:46 AM Aashish Sharma wrote:
>
> Fix the kernel test robot warning below:
>
> drivers/gpu/drm/amd/amdgpu/../display/dmub/inc/dmub_cmd.h:2893:12:
> warning: variable 'temp' set but not used [-Wunused-but-set-variable]
>
> Replaced the assignment to
SMU takes clock limits in Mhz units. socclk and fclk were
using 10 khz units in some cases. Switch to Mhz units.
Fixes higher than required SoC clocks.
Fixes: 97cf32996c46d9 ("drm/amd/pm: Removed fixed clock in auto mode DPM")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/pm/powerplay/hw
From: Alex Deucher
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https:
From: Xin Xiong
[ Upstream commit dfced44f122c54a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence
On 4/1/2022 7:32 PM, Alex Deucher wrote:
Seems to cause a reboots or hangs on some systems.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953
Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-of
From: Alex Deucher
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https:
From: Rajneesh Bhardwaj
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.
[ +0.03] WARNING: possible recursive locking detected
[
From: Xin Xiong
[ Upstream commit dfced44f122c54a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence
From: Alex Deucher
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https:
From: Rajneesh Bhardwaj
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.
[ +0.03] WARNING: possible recursive locking detected
[
From: Xin Xiong
[ Upstream commit dfced44f122c54a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence
From: Dale Zhao
[ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ]
[Why]
For allow eDP hot-plug feature, the stream signal may change to VIRTUAL
when plug-out and back to eDP when plug-in. OS will still setPathMode
with same timing for each plugging, but eDP gets no stream update as we
From: Alex Deucher
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https:
From: Rajneesh Bhardwaj
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.
[ +0.03] WARNING: possible recursive locking detected
[
From: Philip Yang
[ Upstream commit ac7c48c0cce00d03b3c95fddcccb0a45257e33e3 ]
SVM ioctls take proper svms->lock to handle race conditions, don't need
take process mutex to serialize ioctls. This also fixes circular locking
warning:
WARNING: possible circular locking dependency detected
Poss
From: Nicholas Kazlauskas
[ Upstream commit b80ddeb29d9df449f875f0b6f5de08d7537c02b8 ]
[Why]
If the DPCD caps specifies a PSR version newer than PSR_VERSION_1 then
we fallback to using PSR_VERSION_1 in amdgpu_dm_set_psr_caps.
This gets overriden with the raw DPCD value in amdgpu_dm_link_setup_p
From: Yongzhi Liu
[ Upstream commit 5d5c6dba2b43e28845d7d7ed32a36802329a5f52 ]
[why]
Resource release is needed on the error handling path
to prevent memory leak.
[how]
Fix this by adding kfree on the error handling path.
Reviewed-by: Harry Wentland
Signed-off-by: Yongzhi Liu
Signed-off-by:
From: Xin Xiong
[ Upstream commit dfced44f122c54a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence
From: Dale Zhao
[ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ]
[Why]
For allow eDP hot-plug feature, the stream signal may change to VIRTUAL
when plug-out and back to eDP when plug-in. OS will still setPathMode
with same timing for each plugging, but eDP gets no stream update as we
From: Alex Deucher
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https:
From: Sung Joon Kim
[ Upstream commit 3b853c316c9321e195414a6fb121d1c2d45b1e87 ]
[why]
In LTTPR non-transparent mode, we need
to reset the cached lane settings before performing
link training on the next PHY repeater. Otherwise,
the cached lane settings will be used for the next
clock recovery e
From: Rajneesh Bhardwaj
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.
[ +0.03] WARNING: possible recursive locking detected
[
From: Philip Yang
[ Upstream commit 6225bb3a88d22594aacea2485dc28ca12d596721 ]
kfd_process_notifier_release flush svm_range_restore_work
which calls svm_range_list_lock_and_flush_work to flush deferred_list
work, but if deferred_list work mmput release the last user, it will
call exit_mmap -> no
From: Philip Yang
[ Upstream commit 367c9b0f1b8750a704070e7ae85234d591290434 ]
svm_deferred_list work should continue to handle deferred_range_list
which maybe split to child range to avoid child range leak, and remove
ranges mmu interval notifier to avoid mm mm_count leak. So taking mm
referenc
From: Philip Yang
[ Upstream commit ac7c48c0cce00d03b3c95fddcccb0a45257e33e3 ]
SVM ioctls take proper svms->lock to handle race conditions, don't need
take process mutex to serialize ioctls. This also fixes circular locking
warning:
WARNING: possible circular locking dependency detected
Poss
From: Nicholas Kazlauskas
[ Upstream commit b80ddeb29d9df449f875f0b6f5de08d7537c02b8 ]
[Why]
If the DPCD caps specifies a PSR version newer than PSR_VERSION_1 then
we fallback to using PSR_VERSION_1 in amdgpu_dm_set_psr_caps.
This gets overriden with the raw DPCD value in amdgpu_dm_link_setup_p
From: Yongzhi Liu
[ Upstream commit 5d5c6dba2b43e28845d7d7ed32a36802329a5f52 ]
[why]
Resource release is needed on the error handling path
to prevent memory leak.
[how]
Fix this by adding kfree on the error handling path.
Reviewed-by: Harry Wentland
Signed-off-by: Yongzhi Liu
Signed-off-by:
From: Xin Xiong
[ Upstream commit dfced44f122c54a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence
From: Dale Zhao
[ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ]
[Why]
For allow eDP hot-plug feature, the stream signal may change to VIRTUAL
when plug-out and back to eDP when plug-in. OS will still setPathMode
with same timing for each plugging, but eDP gets no stream update as we
From: Alex Deucher
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https:
From: Sung Joon Kim
[ Upstream commit 3b853c316c9321e195414a6fb121d1c2d45b1e87 ]
[why]
In LTTPR non-transparent mode, we need
to reset the cached lane settings before performing
link training on the next PHY repeater. Otherwise,
the cached lane settings will be used for the next
clock recovery e
From: Rajneesh Bhardwaj
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.
[ +0.03] WARNING: possible recursive locking detected
[
From: "Tianci.Yin"
[ Upstream commit 7270e8957eb9aacf5914605d04865f3829a14bce ]
[why]
In rmmod procedure, kfd sends cp a dequeue request, but the
request does not get response, then an error message "cp
queue pipe 4 queue 0 preemption failed" printed.
[how]
Performing kfd suspending after disab
From: Philip Yang
[ Upstream commit 6225bb3a88d22594aacea2485dc28ca12d596721 ]
kfd_process_notifier_release flush svm_range_restore_work
which calls svm_range_list_lock_and_flush_work to flush deferred_list
work, but if deferred_list work mmput release the last user, it will
call exit_mmap -> no
From: Philip Yang
[ Upstream commit 367c9b0f1b8750a704070e7ae85234d591290434 ]
svm_deferred_list work should continue to handle deferred_range_list
which maybe split to child range to avoid child range leak, and remove
ranges mmu interval notifier to avoid mm mm_count leak. So taking mm
referenc
From: Philip Yang
[ Upstream commit ac7c48c0cce00d03b3c95fddcccb0a45257e33e3 ]
SVM ioctls take proper svms->lock to handle race conditions, don't need
take process mutex to serialize ioctls. This also fixes circular locking
warning:
WARNING: possible circular locking dependency detected
Poss
From: Nicholas Kazlauskas
[ Upstream commit b80ddeb29d9df449f875f0b6f5de08d7537c02b8 ]
[Why]
If the DPCD caps specifies a PSR version newer than PSR_VERSION_1 then
we fallback to using PSR_VERSION_1 in amdgpu_dm_set_psr_caps.
This gets overriden with the raw DPCD value in amdgpu_dm_link_setup_p
From: Yongzhi Liu
[ Upstream commit 5d5c6dba2b43e28845d7d7ed32a36802329a5f52 ]
[why]
Resource release is needed on the error handling path
to prevent memory leak.
[how]
Fix this by adding kfree on the error handling path.
Reviewed-by: Harry Wentland
Signed-off-by: Yongzhi Liu
Signed-off-by:
From: Xin Xiong
[ Upstream commit dfced44f122c54a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence
From: Dale Zhao
[ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ]
[Why]
For allow eDP hot-plug feature, the stream signal may change to VIRTUAL
when plug-out and back to eDP when plug-in. OS will still setPathMode
with same timing for each plugging, but eDP gets no stream update as we
From: Eric Huang
[ Upstream commit f61c40c0757a79bcf744314df606c2bc8ae6a729 ]
SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.
Signed-off-by: Eric Huang
Acked-by: Alex Deucher
Reviewed-by: Felix Kuehling
Signed-off-by: Alex Deucher
Signed-off-
Seems to cause a reboots or hangs on some systems.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953
Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/p
On Fri, Apr 1, 2022 at 7:53 AM Lazar, Lijo wrote:
>
>
>
> On 3/31/2022 8:26 PM, Alex Deucher wrote:
> > Seems to cause a reboots or hangs on some systems.
> >
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953
> > Fixes: daf8
On Fri, Apr 1, 2022 at 1:54 AM Paul Menzel wrote:
>
> Dear Haowen,
>
>
> Thank you for your patch.
>
> Am 31.03.22 um 07:56 schrieb Haowen Bai:
>
> In the commit message summary, please use:
>
> Remove unneeded semicolon
>
> > report by coccicheck:
> > drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:1951:2-
[why & how]
As per eDP 1.5 spec, add the below two DPCD bit fields for PSR-SU
support and capability:
1. DP_PSR2_WITH_Y_COORD_ET_SUPPORTED
2. DP_PSR2_SU_AUX_FRAME_SYNC_NOT_NEEDED
changes in v2
--
* fixed the typo
* explicitly list what DPCD bit fields are added
Signed-off-by: Dav
This ensures userspace cannot prematurely clean-up the client before
it is fully initialised which has been proven to cause issues in the
past.
Cc: Felix Kuehling
Cc: Alex Deucher
Cc: "Christian König"
Cc: "Pan, Xinhui"
Cc: David Airlie
Cc: Daniel Vetter
Cc: amd-gfx@lists.freedesktop.org
Cc:
Hi Evan, this is what was logged (filtering for drm and amdgpu) when I
blacklisted amdgpu then manually did:
modprobe amdgpu si_support=1 gpu_recovery=1
Apr 1 18:31:14 am64 kernel: [0.00] Command line:
BOOT_IMAGE=/vmlinuz-5.17.0+ root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro
amdgp
Hi, short answer is that with both patches applied, I am successfully running
the amdgpu kernel module on radeonsi (plasma desktop on X.org).
I confirmed that CONFIG_LOCKDEP_SUPPORT=y is enabled in the kernel.
With the first patch applied and remotely connecting to the machine and
loading amdgpu
On Fri, 01 Apr 2022, Lee Jones wrote:
> This ensures userspace cannot prematurely clean-up the client before
> it is fully initialised which has been proven to cause issues in the
> past.
>
> Cc: Felix Kuehling
> Cc: Alex Deucher
> Cc: "Christian König"
> Cc: "Pan, Xinhui"
> Cc: David Airlie
On 3/31/2022 8:26 PM, Alex Deucher wrote:
Seems to cause a reboots or hangs on some systems.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953
Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-o
[AMD Official Use Only]
Yes, as Christian mentioned, enabling CONFIG_LOCKDEP_SUPPORT will help
debugging such deadlock issue.
Meanwhile, can you give the following change(drop the lock protections in
amdgpu_dpm_compute_clocks) a try?
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
b/drivers/g
Hi Arthur,
apart from blacklisting amdgpu I generally advise to SSH from another
computer into the affected system if you have a problem like this.
Additionally to what Evan said I suggest that you enable
CONFIG_LOCKDEP_SUPPORT in your kernel configuration. This will yield
warnings in your s
Am 31.03.22 um 16:37 schrieb Felix Kuehling:
Am 2022-03-31 um 02:27 schrieb Christian König:
Am 30.03.22 um 22:51 schrieb philip yang:
On 2022-03-30 05:00, Christian König wrote:
Testing the valid bit is not enough to figure out if we
need to invalidate the TLB or not.
During eviction it is
在 4/1/22 1:54 PM, Paul Menzel 写道:
> Dear Haowen,
>
>
> Thank you for your patch.
>
> Am 31.03.22 um 07:56 schrieb Haowen Bai:
>
> In the commit message summary, please use:
>
> Remove unneeded semicolon
>
>> report by coccicheck:
>> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:1951:2-3: Unneeded semicolon
report by coccicheck:
drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:1951:2-3: Unneeded semicolon
Fixes: c543dcbe4237 ("drm/amdgpu/vcn: Add VCN ras error query support")
Signed-off-by: Haowen Bai
---
V1->V2: change title; change Fixed info;
drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 2 +-
1 file changed,
With this, we can support more CG flags.
Signed-off-by: Evan Quan
Acked-by: Alex Deucher
Reviewed-by: Hawking Zhang
Change-Id: Iccf13c2f9c570ca6a4654291fc4876556125c3b8
--
v1->v2:
- amdgpu_debugfs_gca_config_read: add a new rev to
support CG flag upper 32 bits(Alex)
v2->v3:
- use '%llx'
[AMD Official Use Only]
Hi Arthur,
Can you try to blacklist amdgpu module first and then do manual driver loading?
Hope via that you can have a chance to observe the errors reported by driver.
BR
Evan
> -Original Message-
> From: Arthur Marsh
> Sent: Thursday, March 31, 2022 12:27 PM
>
66 matches
Mail list logo