Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure

2021-12-17 Thread Christian König
Am 17.12.21 um 03:26 schrieb Leslie Shi: [Why] In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the following release callba

[PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Guchun Chen
sdma queue number is not correct like on vega20, this patch promises the setting keeps the same after code refactor. Additionally, improve code to use switch case to list IP version to complete kfd device_info structure filling. This keeps consistency with the IP parse code in amdgpu_discovery.c.

Re: [Bug Report] Desktop monitor sleep regression

2021-12-17 Thread Imre Deak
On Fri, Dec 17, 2021 at 03:46:21PM +0100, Thorsten Leemhuis wrote: > added some CCs Geert added in his reply > > On 07.12.21 08:20, Thorsten Leemhuis wrote: > > > > [TLDR: adding this regression to regzbot; most of this mail is compiled > > from a few templates paragraphs some of you might have s

Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure

2021-12-17 Thread Andrey Grodzovsky
Reviewed-by: Andrey Grodzovsky Andrey On 2021-12-17 3:49 a.m., Christian König wrote: Am 17.12.21 um 03:26 schrieb Leslie Shi: [Why] In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Sider, Graham
[Public] > -Original Message- > From: Chen, Guchun > Sent: Friday, December 17, 2021 9:31 AM > To: amd-gfx@lists.freedesktop.org; Deucher, Alexander > ; Sider, Graham > ; Kuehling, Felix ; > Kim, Jonathan > Cc: Chen, Guchun > Subject: [PATCH] drm/amdkfd: correct sdma queue number in kfd

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Kim, Jonathan
[AMD Official Use Only] Are safeguards required for KFD interrupt initialization to fail gracefully in the event of a non-assignment? Same would apply when KGD forwards interrupts to the KFD (although the KFD device reference might not exist at this point if the above comment is handled well so

[PATCH 1/4] drm/amdgpu: Increase potential product_name to 64 characters

2021-12-17 Thread Kent Russell
Having seen at least 1 42-character product_name, bump the number up to 64, and put that definition into amdgpu.h to make future adjustments simpler. Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 12 +

[PATCH 2/4] drm/amdgpu: Enable unique_id for Aldebaran

2021-12-17 Thread Kent Russell
It's supported, so support the unique_id sysfs file Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index 082539c70fd4..dfefb147ac2c 1

[PATCH 3/4] drm/amdgpu: Only overwrite serial if field is empty

2021-12-17 Thread Kent Russell
On Aldebaran, the serial may be obtained from the FRU. Only overwrite the serial with the unique_id if the serial is empty. This will support printing serial numbers for mGPU devices where there are 2 unique_ids for the 2 GPUs, but only one serial number for the board Signed-off-by: Kent Russell

[PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-17 Thread Kent Russell
This is supported, although the offset is different from VG20, so fix that with a variable and enable getting the product name and serial number from the FRU. Do this for all SKUs since all SKUs have the FRU Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 13

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye
For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Lig

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Kim, Jonathan
> -Original Message- > From: Sider, Graham > Sent: December 17, 2021 10:06 AM > To: Chen, Guchun ; amd- > g...@lists.freedesktop.org; Deucher, Alexander > ; Kuehling, Felix > ; Kim, Jonathan > Subject: RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd > device init > > [Public]

Re: [Bug Report] Desktop monitor sleep regression

2021-12-17 Thread Thorsten Leemhuis
added some CCs Geert added in his reply On 07.12.21 08:20, Thorsten Leemhuis wrote: > > [TLDR: adding this regression to regzbot; most of this mail is compiled > from a few templates paragraphs some of you might have seen already.] > > Hi, this is your Linux kernel regression tracker speaking.

Re: [PATCH] drm/amdgpu: Check the memory can be accesssed by ttm_device_clear_dma_mappings.

2021-12-17 Thread Alex Deucher
On Thu, Dec 16, 2021 at 10:25 AM Surbhi Kakarya wrote: > > If the event guard is enabled and VF doesn't receive an ack from PF for full > access, > the guest driver load crashes. > This is caused due to the call to ttm_device_clear_dma_mappings with > non-initialized > mman during driver tear do

Re: [PATCH] drm/amdgpu: Try To using WARN() instead BUG() avoid kernel panic

2021-12-17 Thread Deucher, Alexander
[Public] I think these are pretty fundamental errors. You should never hit them in practice and if you do, I think a BUG is fine. Alex From: ZhiJie.Zhang Sent: Thursday, December 16, 2021 9:38 PM To: Koenig, Christian ; Deucher, Alexander ; amd-gfx@lists.free

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun
[AMD Official Use Only] First , the name of heavy SBR is confusing when you need to go through light SBR code path. Secondary, originally we introduce the light SBR is because on older asic, FW can not synchronize the reset on the devices within the hive, so it depends on driver to sync t

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Saye, Sashank
[AMD Official Use Only] Hi Shaoyun, Yes, From SMU FW point of view they do see a difference between Bare metal and passthrough case for SBR. For baremetal they get it as a PCI reset whereas passthrough case they get it as a BIF reset. Now within BIF reset they would need to differentiate betwee

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun
[AMD Official Use Only] Ok, sounds reasonable . I'm ok for the function name change . Another concern , from driver side , before it start the ip init , it will check the SMU clock to determine whether the asic need a reset from driver side . For your case , the hypervisor will trigger the

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Saye, Sashank
[AMD Official Use Only] Yeah after smu does the mode 1 reset, the clock is cleared, hence when the driver boots after that, it will look like a regular cold boot. Regards Sashank -Original Message- From: Liu, Shaoyun Sent: Friday, December 17, 2021 12:07 PM To: Saye, Sashank ; amd-gf

Re: [Bug Report] Desktop monitor sleep regression

2021-12-17 Thread Thorsten Leemhuis
On 17.12.21 15:52, Imre Deak wrote: > On Fri, Dec 17, 2021 at 03:46:21PM +0100, Thorsten Leemhuis wrote: >> added some CCs Geert added in his reply >> >> On 07.12.21 08:20, Thorsten Leemhuis wrote: >>> >>> [TLDR: adding this regression to regzbot; most of this mail is compiled >>> from a few tem

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye
For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Lig

Re: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-17 Thread Deucher, Alexander
[AMD Official Use Only] Series is: Reviewed-by: Alex Deucher From: amd-gfx on behalf of Kent Russell Sent: Friday, December 17, 2021 10:31 AM To: amd-gfx@lists.freedesktop.org Cc: Russell, Kent Subject: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran Thi

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun
[AMD Official Use Only] >From your explanation , seems SMU always need this special handling for SBR >on passthrough mode , but in the code , that only apply to XGMI >configuration. Should you change that as well ? Two comments inline. Regards Shaoyun.liu -Original Message- Fro

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye
For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Lig

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye
For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Lig

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun
[AMD Official Use Only] Comment inline . -Original Message- From: amd-gfx On Behalf Of sashank saye Sent: Friday, December 17, 2021 1:19 PM To: amd-gfx@lists.freedesktop.org Cc: Saye, Sashank Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling Fo

[PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread sashank saye
For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Lig

RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling

2021-12-17 Thread Liu, Shaoyun
[AMD Official Use Only] Reviewed by: Shaoyun.liu -Original Message- From: amd-gfx On Behalf Of sashank saye Sent: Friday, December 17, 2021 1:56 PM To: amd-gfx@lists.freedesktop.org Cc: Saye, Sashank Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr han

[PATCH] drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config

2021-12-17 Thread Nicholas Kazlauskas
[Why] A porting error on a previous patch left the block of code that causes the crash from a NULL pointer dereference. More specifically, we try to access link_enc before it's assigned in the USB4 case in the following assignment: config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPH

Re: [PATCH] drm/amd/display: Fix USB4 null pointer dereference in update_psp_stream_config

2021-12-17 Thread Harry Wentland
On 2021-12-17 14:25, Nicholas Kazlauskas wrote: > [Why] > A porting error on a previous patch left the block of code that > causes the crash from a NULL pointer dereference. > > More specifically, we try to access link_enc before it's assigned in > the USB4 case in the following assignment: > > c

Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

2021-12-17 Thread Alex Deucher
If you could get me a copy of the vbios image from a problematic board, that would be helpful. In the meantime, I've applied the patch. Alex On Thu, Dec 16, 2021 at 9:38 PM 周宗敏 wrote: > Dear Alex: > > > >Is the issue reproducible with the same board in bare metal on x86?Or > does it only happ

[PATCH 00/19] DC Patches December 17, 2021

2021-12-17 Thread Rodrigo Siqueira
This DC patchset brings improvements in multiple areas. In summary, we highlight: - Fixes and improvements in the LTTPR code - Improve z-state - Fix null pointer check - Improve communication with s0i2 - Update multiple-display split policy - Add missing registers Cc: Daniel Wheeler Thanks Siqu

[PATCH 02/19] drm/amd/display: Limit max link cap with LTTPR caps

2021-12-17 Thread Rodrigo Siqueira
From: George Shen [Why] Max link rate should be limited to the maximum link rate support by any LTTPR that are connected, including when operating in transparent mode. [How] Include transparent mode when factoring in LTTPR max supported link rate. Reviewed-by: Wesley Chalmers Acked-by: Rodrigo

[PATCH 03/19] drm/amd/display: Refactor vendor specific link training sequence

2021-12-17 Thread Rodrigo Siqueira
From: "Shen, George" [Why] Current implementation is not scalable and retrofits the existing standard link training code for purposes outside of its original design. [How] Refactor vendor specific link training sequence into its own separate function to be called instead of the standard link tra

[PATCH 01/19] drm/amd/display: fix B0 TMDS deepcolor no dislay issue

2021-12-17 Thread Rodrigo Siqueira
From: Charlene Liu [why] B0 PHY C map to F, D map to G driver use logic instance, dmub does the remap. Driver still need use the right PHY instance to access right HW. [how] use phyical instance when program PHY register. [note] could move resync_control programming to dmub next. Reviewed-by:

[PATCH 07/19] drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization

2021-12-17 Thread Rodrigo Siqueira
From: Nicholas Kazlauskas [Why] Otherwise SMU won't mark Display as idle when trying to perform s2idle. [How] Mark the bit in the dcn31 codepath, doesn't apply to older ASIC. It needed to be split from phy refclk off to prevent entering s2idle when PSR was engaged but driver was not ready. Fix

[PATCH 06/19] drm/amd/display: Fix check for null function ptr

2021-12-17 Thread Rodrigo Siqueira
From: Alvin Lee [Why] Bug fix for null function ptr (should check for NULL instead of not NULL) [How] Fix if condition Reviewed-by: Samson Tam Acked-by: Rodrigo Siqueira Signed-off-by: Alvin Lee --- drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c | 4 ++-- 1 file changed, 2 insertions(+), 2

[PATCH 04/19] drm/amd/display: Block z-states when stutter period exceeds criteria

2021-12-17 Thread Rodrigo Siqueira
From: Nicholas Kazlauskas [Why] Stutter period won't be less than 5000.0, but if PSR is enabled then we can potentially enter Z9 when MPO is enabled. SMU will try to enter Z9 too early in these cases (before PSR is enabled) and we'll see underflow. [How] Block z-states (z9, z10) until we can ad

[PATCH 09/19] drm/amd/display: Set optimize_pwr_state for DCN31

2021-12-17 Thread Rodrigo Siqueira
From: Nicholas Kazlauskas [Why] We'll exit optimized power state to do link detection but we won't enter back into the optimized power state. This could potentially block s2idle entry depending on the sequencing, but it also means we're losing some power during the transition period. [How] Hook

[PATCH 08/19] drm/amd/display: Remove CR AUX RD Interval limit for LTTPR

2021-12-17 Thread Rodrigo Siqueira
From: George Shen [Why] DP spec specifies that DPRX shall use the read interval in the TRAINING_AUX_RD_INTERVAL_PHY_REPEATER LTTPR DPCD register. This register's bit definition is the same as the AUX read interval register for DPRX. [How} Remove logic which forces AUX read interval to 100us for

[PATCH 12/19] drm/amd/display: Undo ODM combine

2021-12-17 Thread Rodrigo Siqueira
From: Martin Leung Undo ODM Combine regression causing causing pipe allocation issues. Reviewed-by: Aric Cyr Acked-by: Rodrigo Siqueira Signed-off-by: Martin Leung --- .../gpu/drm/amd/display/dc/core/dc_resource.c | 81 +-- .../drm/amd/display/dc/dcn30/dcn30_resource.c | 11 -

[PATCH 05/19] drm/amd/display: Added power down for DCN10

2021-12-17 Thread Rodrigo Siqueira
From: "Lai, Derek" [Why] The change of setting a timer callback on boot for 10 seconds is still working, just lacked power down for DCN10. [How] Added power down for DCN10. Reviewed-by: Anthony Koo Acked-by: Rodrigo Siqueira Signed-off-by: Derek Lai --- drivers/gpu/drm/amd/display/dc/dcn10/

[PATCH 13/19] drm/amd/display: [FW Promotion] Release 0.0.98

2021-12-17 Thread Rodrigo Siqueira
From: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Rodrigo Siqueira Signed-off-by: Anthony Koo --- drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h b/drivers/gpu/drm/amd/disp

[PATCH 11/19] drm/amd/display: Add reg defs for DCN303

2021-12-17 Thread Rodrigo Siqueira
From: Wesley Chalmers [WHY] These registers are currently missing from the DCN303 header files Reviewed-by: George Shen Acked-by: Rodrigo Siqueira Signed-off-by: Wesley Chalmers --- .../drm/amd/display/dc/dcn303/dcn303_dccg.h | 20 +-- 1 file changed, 18 insertions(+), 2 de

[PATCH 15/19] drm/amd/display: define link res and make it accessible to all link interfaces

2021-12-17 Thread Rodrigo Siqueira
From: Wenjing Liu [why] There will be a series of re-arch changes in Link Resource Management. They are more and more muxable link resource objects and the resource is insufficient for a one to one allocation to all links created. Therefore a link resource sharing logic is required to determine w

[PATCH 14/19] drm/amd/display: 3.2.167

2021-12-17 Thread Rodrigo Siqueira
From: Aric Cyr This version brings along the following: - Fixes and improvements in the LTTPR code - Improve z-state - Fix null pointer check - Improve communication with s0i2 - Update multiple-display split policy - Add missing registers Acked-by: Rodrigo Siqueira Signed-off-by: Aric Cyr ---

[PATCH 10/19] drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

2021-12-17 Thread Rodrigo Siqueira
From: Angus Wang [WHY] Current implementation of pipe split policy prevents pipe split with multiple displays connected, which caused the MCLK speed to be stuck at max [HOW] Changed the pipe split policies so that pipe split is allowed for multi-display configurations Reviewed-by: Aric Cyr Ack

[PATCH 16/19] drm/amd/display: populate link res in both detection and validation

2021-12-17 Thread Rodrigo Siqueira
From: Wenjing Liu [why] This commit is to populate link res in preparation of the next commit. The next commit will replace all existing code to use link res instead Reviewed-by: Jun Lei Acked-by: Rodrigo Siqueira Signed-off-by: Wenjing Liu --- drivers/gpu/drm/amd/display/dc/core/dc_link.c

[PATCH 17/19] drm/amd/display: access hpo dp link encoder only through link resource

2021-12-17 Thread Rodrigo Siqueira
From: Wenjing Liu [why] Update all accesses to use hpo dp link encoder through link resource only. Reviewed-by: Jun Lei Acked-by: Rodrigo Siqueira Signed-off-by: Wenjing Liu --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 22 +++--- .../gpu/drm/amd/display/dc/core/dc_link_dp.c |

[PATCH 18/19] drm/amd/display: support dynamic HPO DP link encoder allocation

2021-12-17 Thread Rodrigo Siqueira
From: Wenjing Liu [why] When there are more DP2.0 RXs connected than the number HPO DP link encoders we have, we need to dynamically allocate HPO DP link encoder to the port that needs it. [how] Only allocate HPO DP link encoder when it is needed. Reviewed-by: Jun Lei Acked-by: Rodrigo Siqueir

[PATCH 19/19] drm/amd/display: get and restore link res map

2021-12-17 Thread Rodrigo Siqueira
From: Wenjing Liu [why] When reboot the link res map should be persisted. So during boot up, driver will look at the map to determine which link should take priority to use certain link res. This is to ensure that link res remains unshuffled after a reboot. Reviewed-by: Jun Lei Acked-by: Rodr

Re: [PATCH 10/19] drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

2021-12-17 Thread Deucher, Alexander
[AMD Official Use Only] Maybe add Bug links for: https://gitlab.freedesktop.org/drm/amd/-/issues/1522 https://gitlab.freedesktop.org/drm/amd/-/issues/1709 https://gitlab.freedesktop.org/drm/amd/-/issues/1655 https://gitlab.freedesktop.org/drm/amd/-/issues/1403

RE: [PATCH 00/19] DC Patches December 17, 2021

2021-12-17 Thread Wheeler, Daniel
[AMD Official Use Only] Hi all, This week this patchset was tested on the following systems: Lenovo Thinkpad T14s Gen2 with AMD Ryzen 5 5650U, with the following display types: eDP 1080p 60hz, 4k 60hz (via USB-C to DP/HDMI), 1440p 144hz (via USB-C to DP/HDMI), 1680*1050 60hz (via USB-C to D

Re: [PATCH 10/19] drm/amd/display: Changed pipe split policy to allow for multi-display pipe split

2021-12-17 Thread Rodrigo Siqueira Jordao
On 2021-12-17 4:36 p.m., Deucher, Alexander wrote: [AMD Official Use Only] Maybe add Bug links for: https://gitlab.freedesktop.org/drm/amd/-/issues/1522 https://gitlab.freedesktop.org/drm/amd/-/issues/1709

[RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

2021-12-17 Thread Andrey Grodzovsky
This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same o

[RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready

2021-12-17 Thread Andrey Grodzovsky
Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/d

[RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2021-12-17 Thread Andrey Grodzovsky
Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeue

[RFC 3/6] drm/amdgpu: Fix crash on modprobe

2021-12-17 Thread Andrey Grodzovsky
Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/g

[RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq

2021-12-17 Thread Andrey Grodzovsky
Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 9 + drivers/gpu/drm/amd/amd

[RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device

2021-12-17 Thread Andrey Grodzovsky
Since now all GPU resets are serialzied there is no need for this. This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout' Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++ 1 file changed, 7 insertions(+), 82 deleti

[RFC 5/6] drm/amdgpu: Drop hive->in_reset

2021-12-17 Thread Andrey Grodzovsky
Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 - 3 files changed

RE: [PATCH] drm/amdkfd: correct sdma queue number in kfd device init

2021-12-17 Thread Chen, Guchun
[Public] Hi Graham, My general thought is, from what I observed, IP version does not change in a linear variation manner, so moving to switch case may be easier for user to decode this. Also, I want to get the code aligned with the IP parse code in amdgpu_discovery.c. Please correct me if I a

RE: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-17 Thread Chen, Guchun
[Public] Hi Kent, + + if (adev->asic_type == CHIP_ALDEBARAN) + offset = 0; if (!is_fru_eeprom_supported(adev)) I prefer to put 'adev->asic_type == CHIP_ALDEBARAN' after calling is_fru_eeprom_supported to make code logic cleaner. Without FRU support, we should do n