Re: [PATCH 1/2] drm/amdgpu/acpi: unify ATCS handling (v3)

2021-05-21 Thread Lijo Lazar
On 5/20/2021 9:26 PM, Alex Deucher wrote: Treat it like ATIF and check both the dGPU and APU for the method. This is required because ATCS may be hung off of the APU in ACPI on A+A systems. v2: add back accidently removed ACPI handle check. v3: Fix incorrect atif check (Colin) Fix unini

Re: [PATCH 2/2] drm/amdgpu/apci: switch ATIF/ATCS probe order

2021-05-21 Thread Lijo Lazar
Reviewed-by: Lijo Lazar On 5/20/2021 9:26 PM, Alex Deucher wrote: Try the handle from ATPX first since this is the most common case. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/d

Re: [PATCH 38/38] drm/amd/amdgpu/smuio_v13_0: Realign 'smuio_v13_0_is_host_gpu_xgmi_supported()' header

2021-05-21 Thread Lee Jones
On Thu, 20 May 2021, Alex Deucher wrote: > Applied. Thanks! Thanks again Alex. > On Thu, May 20, 2021 at 8:03 AM Lee Jones wrote: > > > > Fixes the following W=1 kernel build warning(s): > > > > drivers/gpu/drm/amd/amdgpu/smuio_v13_0.c:99: warning: expecting prototype > > for smuio_v13_0_sup

[PATCH] drm/ttm: Skip swapout if ttm object is not populated

2021-05-21 Thread xinhui pan
Swapping a ttm object which has no backend pages makes no sense. Suggested-by: Christian König Signed-off-by: xinhui pan --- drivers/gpu/drm/ttm/ttm_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c i

Re: [PATCH v2 1/3] drm/amdgpu: Add new placement for preemptible SG BOs

2021-05-21 Thread Christian König
Am 21.05.21 um 04:22 schrieb Felix Kuehling: SG BOs such as dmabuf imports and userptr BOs do not consume system resources directly. Instead they point to resources owned elsewhere. They typically get evicted by DMABuf move notifiers of MMU notifiers. If those notifiers don't need to wait for har

Re: [PATCH v2 2/3] drm/amdgpu: Use preemptible placement for KFD

2021-05-21 Thread Christian König
Am 21.05.21 um 04:22 schrieb Felix Kuehling: KFD userptr BOs and SG BOs used for DMA mappings can be preempted with CWSR. Therefore we can use preemptible placement and avoid unwanted evictions due to GTT accounting. Signed-off-by: Felix Kuehling Acked-by: Christian König --- drivers/gpu

Re: [PATCH v2 3/3] drm/amdgpu: Workaround IOMMU driver bug

2021-05-21 Thread Christian König
Am 21.05.21 um 04:22 schrieb Felix Kuehling: The intel IOMMU driver causes kernel oopses or internal errors flooding kernel log when mapping larger SG tables. Limiting the size of userptr BOs to 6GB seems to avoid this. Signed-off-by: Felix Kuehling CC whoever is the maintainer of the Intel I

Re: [PATCH] drm/ttm: Skip swapout if ttm object is not populated

2021-05-21 Thread Christian König
Am 21.05.21 um 10:31 schrieb xinhui pan: Swapping a ttm object which has no backend pages makes no sense. Suggested-by: Christian König Signed-off-by: xinhui pan Reviewed-by: Christian König Going to add a CC: stable and pushing that to drm-misc-fixes in a minute. --- drivers/gpu/drm/t

RE: [PATCH v5 09/10] drm/amdgpu: Use PSP to program IH_RB_CNTL* registers

2021-05-21 Thread Zhou, Peng Ju
[AMD Official Use Only - Internal Distribution Only] Hi @Zhao, Victor/@Deng, Emily Can you help to answer Alex's question,? Because this patch originally from @Zhao, Victor, it's hard for me to explain the question. Alex's question: > > --- a/drivers/gpu/drm/amd/amdgpu/nv.c > > +++ b/drivers/gp

RE: [PATCH v5 09/10] drm/amdgpu: Use PSP to program IH_RB_CNTL* registers

2021-05-21 Thread Deng, Emily
Hi Pengju, You'd better only switch for sriov. Best wishes Emily Deng >-Original Message- >From: Zhou, Peng Ju >Sent: Friday, May 21, 2021 5:58 PM >To: Alex Deucher ; Zhao, Victor >; Deng, Emily >Cc: amd-gfx list >Subject: RE: [PATCH v5 09/10] drm/amdgpu: Use PSP to program IH_RB_

[PATCH 3/7] drm/amdgpu: use amdgpu_bo_vm for vm code

2021-05-21 Thread Nirmoy Das
Use amdgpu_bo_vm for BO for PT/PD. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 43 ++ 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 55991f39

[PATCH 2/7] drm/amdgpu: add a new identifier for amdgpu_bo

2021-05-21 Thread Nirmoy Das
Add has_shadow to identify if a BO is shadowed. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 ++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/driv

[PATCH 1/7] drm/amdgpu: add amdgpu_bo_vm bo type

2021-05-21 Thread Nirmoy Das
Add new BO subcalss that will be used by amdgpu vm code. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 32 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 10 +++ 2 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgp

[PATCH 5/7] drm/amdgpu: switch to amdgpu_bo_vm's shadow

2021-05-21 Thread Nirmoy Das
Use shadow of amdgpu_bo_vm instead of the base class. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 27 - drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 16 ++-- 3 files changed, 26 ins

[PATCH 6/7] drm/amdgpu: remove unused code

2021-05-21 Thread Nirmoy Das
Remove unused code related to shadow BO. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 30 -- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 4 --- 2 files changed, 34 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers

[PATCH 7/7] drm/amdgpu: do not allocate entries separately

2021-05-21 Thread Nirmoy Das
Allocate PD/PT entries while allocating VM BOs and use that instead of allocating those entries separately. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 ++ 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/am

[PATCH 4/7] drm/amdgpu: create shadow bo directly

2021-05-21 Thread Nirmoy Das
Shadow BOs are only needed by VM code so create it directly within vm code. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/

Re: [PATCH 2/7] drm/amdgpu: add a new identifier for amdgpu_bo

2021-05-21 Thread Christian König
Am 21.05.21 um 14:45 schrieb Nirmoy Das: Add has_shadow to identify if a BO is shadowed. Ok that is not going into the right direction. Instead of identifying which BOs have a shadow we need to identify if this is a VM BO or not. I think the first think you need to do is to move the shadow

Re: [PATCH 7/7] drm/amdgpu: do not allocate entries separately

2021-05-21 Thread Christian König
Am 21.05.21 um 14:45 schrieb Nirmoy Das: Allocate PD/PT entries while allocating VM BOs and use that instead of allocating those entries separately. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 ++ 1 file changed, 17 insertions(+), 14 dele

[PATCH] drm/amdgpu: Fix inconsistent indenting

2021-05-21 Thread Jiapeng Chong
Eliminate the follow smatch warning: drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:449 sdma_v5_0_ring_emit_mem_sync() warn: inconsistent indenting. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 13 ++--- 1 file changed, 6 insertions(+), 7 d

Re: [PATCH] drm/amdgpu: Fix inconsistent indenting

2021-05-21 Thread Christian König
Am 21.05.21 um 11:50 schrieb Jiapeng Chong: Eliminate the follow smatch warning: drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:449 sdma_v5_0_ring_emit_mem_sync() warn: inconsistent indenting. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong Reviewed-by: Christian König --- drivers/gpu/dr

Re: [PATCH v2 3/3] drm/amdgpu: Workaround IOMMU driver bug

2021-05-21 Thread Felix Kuehling
Am 2021-05-21 um 4:41 a.m. schrieb Christian König: > Am 21.05.21 um 04:22 schrieb Felix Kuehling: >> The intel IOMMU driver causes kernel oopses or internal errors flooding >> kernel log when mapping larger SG tables. Limiting the size of >> userptr BOs >> to 6GB seems to avoid this. >> >> Signed-

Re: [PATCH 1/2] drm/amdgpu/acpi: unify ATCS handling (v3)

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 3:12 AM Lijo Lazar wrote: > > > > On 5/20/2021 9:26 PM, Alex Deucher wrote: > > Treat it like ATIF and check both the dGPU and APU for > > the method. This is required because ATCS may be hung > > off of the APU in ACPI on A+A systems. > > > > v2: add back accidently remov

[PATCH] drm/amdkfd: use resource cursor in svm_migrate_copy_to_vram

2021-05-21 Thread Christian König
Access to the mm_node is now forbidden. So instead of hand wiring that use the cursor functionality. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 76 +++- 1 file changed, 9 insertions(+), 67 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkf

Re: [PATCH 2/7] drm/amdgpu: add a new identifier for amdgpu_bo

2021-05-21 Thread Nirmoy
On 5/21/21 2:58 PM, Christian König wrote: Am 21.05.21 um 14:45 schrieb Nirmoy Das: Add has_shadow to identify if a BO is shadowed. Ok that is not going into the right direction. I was expecting this :) but wasn't sure how to handle it. Instead of identifying which BOs have a shadow we

Re: [PATCH 7/7] drm/amdgpu: do not allocate entries separately

2021-05-21 Thread Nirmoy
On 5/21/21 3:01 PM, Christian König wrote: Am 21.05.21 um 14:45 schrieb Nirmoy Das: Allocate PD/PT entries while allocating VM BOs and use that instead of allocating those entries separately. Signed-off-by: Nirmoy Das ---   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 ++---

Re: [PATCH v2 3/3] drm/amdgpu: Workaround IOMMU driver bug

2021-05-21 Thread Zeng, Oak
[Public] Reviewed-by: oak zeng Get Outlook for Android From: amd-gfx on behalf of Felix Kuehling Sent: Friday, May 21, 2021 9:47:17 AM To: Christian König ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH v2 3/3] drm/amdgpu: Worka

Re: [PATCH 1/2] drm/amdgpu/acpi: unify ATCS handling (v3)

2021-05-21 Thread Lijo Lazar
Thanks for clarifying! Reviewed-by: Lijo Lazar On 5/21/2021 7:17 PM, Alex Deucher wrote: On Fri, May 21, 2021 at 3:12 AM Lijo Lazar wrote: On 5/20/2021 9:26 PM, Alex Deucher wrote: Treat it like ATIF and check both the dGPU and APU for the method. This is required because ATCS may be hu

Re: [PATCH v5 09/10] drm/amdgpu: Use PSP to program IH_RB_CNTL* registers

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 6:07 AM Deng, Emily wrote: > > Hi Pengju, > You'd better only switch for sriov. Either verify that this doesn't break bare metal, or do something like we do on sienna cichlid. E.g., if (!amdgpu_sriov_vf(adev)) { amdgpu_device_i

RE: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Kasiviswanathan, Harish
[AMD Official Use Only] Hi Graham, This patch series looks good. Please add "Signed-off-by" to all the commit messages. One additional comment inline below. -Original Message- From: Sider, Graham Sent: Thursday, May 20, 2021 10:29 AM To: amd-gfx@lists.freedesktop.org Cc: Kasiviswanat

Re: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Alex Deucher
General comment on the patch series, do you want to bump the metrics table version since the meaning of the throttler status has changed? Alex On Thu, May 20, 2021 at 10:30 AM Graham Sider wrote: > > Perform dependent to independent throttle status translation for > arcturus. > --- > .../gpu/dr

Re: [PATCH 1/7] drm/amdgpu: add amdgpu_bo_vm bo type

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 8:46 AM Nirmoy Das wrote: > > Add new BO subcalss that will be used by amdgpu vm code. s/subcalss/subclass/ Alex > > Signed-off-by: Nirmoy Das > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 32 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |

[PATCH] drm/amd/amdkfd: Drop unnecessary NULL check after container_of

2021-05-21 Thread Guenter Roeck
The first parameter passed to container_of() is the pointer to the work structure passed to the worker and never NULL. The NULL check on the result of container_of() is therefore unnecessary and misleading. Remove it. This change was made automatically with the following Coccinelle script. @@ typ

Re: [PATCH 1/7] drm/amdgpu: add amdgpu_bo_vm bo type

2021-05-21 Thread Nirmoy
On 5/21/21 4:54 PM, Alex Deucher wrote: On Fri, May 21, 2021 at 8:46 AM Nirmoy Das wrote: Add new BO subcalss that will be used by amdgpu vm code. s/subcalss/subclass/ Thanks, Alex! Alex Signed-off-by: Nirmoy Das --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 32

Re: [PATCH] drm/amdgpu/display: make backlight setting failure messages debug

2021-05-21 Thread Kazlauskas, Nicholas
On 2021-05-21 12:08 a.m., Alex Deucher wrote: Avoid spamming the log. The backlight controller on DCN chips gets powered down when the display is off, so if you attempt to set the backlight level when the display is off, you'll get this message. This isn't a problem as we cache the requested ba

[PATCH 00/15] DC Patches May 24th, 2021

2021-05-21 Thread Qingqing Zhuo
This DC patchset brings improvements in multiple areas. In summary, we highlight: - DC 3.2.137 - Updates on DP configurations and clock recovery API - Improvements on DSC, link training sequence, etc. - Fixes on memory leak, ODM scaling, etc. --- Alvin Lee (1): drm/amd/display: Imp

[PATCH 01/15] drm/amd/display: Added support for individual control for multiple back-light instances.

2021-05-21 Thread Qingqing Zhuo
From: Jake Wang [Why & How] Added support for individual control for multiple back-light instances. Signed-off-by: Jake Wang Reviewed-by: Anthony Koo Acked-by: Qingqing Zhuo --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 26 +++ drivers/gpu/drm/amd/display/dc/dc_link.h

[PATCH 03/15] drm/amd/display: Retrieve DSC Branch Decoder Caps

2021-05-21 Thread Qingqing Zhuo
From: Fangzhi Zuo DSC extended branch decoder caps 0xA0 ~ 0xA2 is read from dsc_aux. The dsc_aux is returned from drm dsc determination policy with the right DSC capable MST branch device for decoding. The values are all zero if DSC decoding at a MST BU with virtual DPCD; The values are meaningf

[PATCH 04/15] drm/amd/display: Update DP link configuration.

2021-05-21 Thread Qingqing Zhuo
From: Jimmy Kizito [Why & How] - Update application of training settings for links whose encoders are assigned dynamically. - Add functionality useful for DP link configuration to public interface. Signed-off-by: Jimmy Kizito Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo --- .../gpu/drm/amd/d

[PATCH 08/15] drm/amd/display: Implement INBOX0 usage in driver

2021-05-21 Thread Qingqing Zhuo
From: Alvin Lee [Why] Start using INBOX0 for HW Lock command [How] - Implement initial interface for INBOX0 HW lock message Signed-off-by: Alvin Lee Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo --- drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c | 9 + drivers/gpu/drm/amd/disp

[PATCH 07/15] drm/amd/display: Fix potential memory leak in DMUB hw_init

2021-05-21 Thread Qingqing Zhuo
From: Roman Li [Why] On resume we perform DMUB hw_init which allocates memory: dm_resume->dm_dmub_hw_init->dc_dmub_srv_create->kzalloc That results in memory leak in suspend/resume scenarios. [How] Allocate memory for the DC wrapper to DMUB only if it was not allocated before. No need to realloc

[PATCH 06/15] drm/amd/display: Remove redundant safeguards for dmub-srv destroy()

2021-05-21 Thread Qingqing Zhuo
From: Roman Li [Why] dc_dmub_srv_destroy() has internal null-check and null assignment. No need to duplicate them externally. [How] Remove redundant safeguards. Signed-off-by: Lang Yu Signed-off-by: Roman Li Reviewed-by: Nicholas Kazlauskas Acked-by: Qingqing Zhuo --- drivers/gpu/drm/amd/d

[PATCH 10/15] drm/amd/display: isolate 8b 10b link training sequence into its own function

2021-05-21 Thread Qingqing Zhuo
From: Wenjing Liu [how] 1. move 8b 10 link trianing into its own function 2. make link status check after a link success link as part of dp transition to video idle sequence. Signed-off-by: Wenjing Liu Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo --- .../gpu/drm/amd/display/dc/core/dc_link_d

[PATCH 05/15] drm/amd/display: Expand DP module clock recovery API.

2021-05-21 Thread Qingqing Zhuo
From: Jimmy Kizito [Why & How] Add functionality useful for DP clock recovery phase of link training to public interface. Signed-off-by: Jimmy Kizito Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo --- .../gpu/drm/amd/display/dc/core/dc_link_dp.c | 79 +-- .../gpu/drm/amd/displ

[PATCH 12/15] drm/amd/display: Refactor SST DSC Determination Policy

2021-05-21 Thread Qingqing Zhuo
From: Fangzhi Zuo [Why & How] SST dsc determination policy becomes bigger when more scenarios are introduced. Take it out to make it clean and readable. Signed-off-by: Fangzhi Zuo Reviewed-by: Nicholas Kazlauskas Acked-by: Qingqing Zhuo --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 105

[PATCH 11/15] drm/amd/display: Add Log for SST DSC Determination Policy

2021-05-21 Thread Qingqing Zhuo
From: Fangzhi Zuo [Why & How] To facilitate DSC debugging purpose Signed-off-by: Fangzhi Zuo Reviewed-by: Nicholas Kazlauskas Acked-by: Qingqing Zhuo --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/a

[PATCH 09/15] drm/amd/display: add exit training mode and update channel coding in LT

2021-05-21 Thread Qingqing Zhuo
From: Wenjing Liu [why] As recommended by DP specs, source needs to make sure DPRX exits previous LT mode before configuring new LT params Nofity what channel coding mode we will use for current link training. Signed-off-by: Wenjing Liu Reviewed-by: Jun Lei Acked-by: Qingqing Zhuo --- .../gp

[PATCH 14/15] drm/amd/display: 3.2.137

2021-05-21 Thread Qingqing Zhuo
From: Aric Cyr DC version 3.2.137 brings improvements in multiple areas. In summary, we highlight: - Updates on DP configurations and clock recovery API - Improvements on DSC, link training sequence, etc. - Fixes on memory leak, ODM scaling, etc. Signed-off-by: Aric Cyr Reviewed-by: Aric Cyr

[PATCH 15/15] Revert "drm/amd/display: Refactor and add visual confirm for HW Flip Queue"

2021-05-21 Thread Qingqing Zhuo
This reverts commit 5791d219561cb661c991332a4f0bca6a8c8db080. Recent visual confirm changes are regressing the driver, causing a black screen on boot in some green sardine configs, or visual confirm is not updated at all. Signed-off-by: Qingqing Zhuo Acked-by: Qingqing Zhuo --- .../amd/display

[PATCH 13/15] drm/amd/display: fix odm scaling

2021-05-21 Thread Qingqing Zhuo
From: Dmytro Laktyushkin There are two issues with scaling calculations, odm recout calculation and matching viewport to actual recout. This change fixes both issues. Odm recout calculation via special casing and viewport matching issue by reworking the viewport calcualtion to use scaling ratios

[PATCH 02/15] drm/amd/display: disable desktop VRR when using older flip model

2021-05-21 Thread Qingqing Zhuo
From: hvanzyll [WHY] OS uses older flip model which does not work with desktop VRR causing memory allocations at the wrong IRQ level. [HOW] Checks added to flip model to verify model is 2.2 or greater when doing any of the desktop VRR checks for full updates. This prevents full updates when VRR

RE: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Sider, Graham
Hi Alex, Are you referring to bumping the gpu_metrics_vX_Y version number? Different ASICs are currently using different version numbers already, so I'm not sure how feasible this might be (e.g. arcturus == gpu_metrics_v1_1, navi1x == gpu_metrics_v1_3, vangogh == gpu_metrics_v2_1). Technicall

Re: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify

2021-05-21 Thread Felix Kuehling
Am 2021-05-21 um 1:26 a.m. schrieb xinhui pan: > The reservation object might be locked again by evict/swap after > individualized. The race is like below. > cpu 0 cpu 1 > BO releaseBO evict or swap > ttm_bo_individualize_resv {resv = &_r

Re: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify

2021-05-21 Thread Christian König
Am 21.05.21 um 20:24 schrieb Felix Kuehling: Am 2021-05-21 um 1:26 a.m. schrieb xinhui pan: The reservation object might be locked again by evict/swap after individualized. The race is like below. cpu 0 cpu 1 BO release BO evict or s

Re: [PATCH] drm/amdkfd: use resource cursor in svm_migrate_copy_to_vram

2021-05-21 Thread philip yang
This simply the logic, several comments inline. Thanks, Philip On 2021-05-21 9:52 a.m., Christian König wrote: Access to the mm_node is now forbidden. So instead of hand wiring that use the cursor functionality. Signed-off-by: Christian König --

[PATCH] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Andrey Grodzovsky
Problem: When device goes into sleep state due to prolonged innactivity (e.g. BACO sleep) and then hot unplugged, PCI core will try to wake up the device as part of unplug process. Since the device is gone all HW programming during rpm resume fails leading to a bad SW state later during pci remove

Re: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 1:39 PM Sider, Graham wrote: > > Hi Alex, > > Are you referring to bumping the gpu_metrics_vX_Y version number? Different > ASICs are currently using different version numbers already, so I'm not sure > how feasible this might be (e.g. arcturus == gpu_metrics_v1_1, navi1

Re: [PATCH] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 4:14 PM Andrey Grodzovsky wrote: > > Problem: > When device goes into sleep state due to prolonged > innactivity (e.g. BACO sleep) and then hot unplugged, > PCI core will try to wake up the device as part of > unplug process. Since the device is gone all HW > programming du

Re: [PATCH] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Andrey Grodzovsky
Will do. Andrey On 2021-05-21 4:18 p.m., Alex Deucher wrote: On Fri, May 21, 2021 at 4:14 PM Andrey Grodzovsky wrote: Problem: When device goes into sleep state due to prolonged innactivity (e.g. BACO sleep) and then hot unplugged, PCI core will try to wake up the device as part of unplug pro

[PATCH v2 1/2] drm/amdgpu: Rename flag which prevents HW access

2021-05-21 Thread Andrey Grodzovsky
Make it's name not feature but function descriptive. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2

[PATCH v2 2/2] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Andrey Grodzovsky
Problem: When device goes into sleep state due to prolonged innactivity (e.g. BACO sleep) and then hot unplugged, PCI core will try to wake up the device as part of unplug process. Since the device is gone all HW programming during rpm resume fails leading to a bad SW state later during pci remove

Re: [PATCH v2 2/2] drm/amdgpu: Fix crash when hot unplug in BACO.

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 4:41 PM Andrey Grodzovsky wrote: > > Problem: > When device goes into sleep state due to prolonged s/sleep state/runtime suspend/ > innactivity (e.g. BACO sleep) and then hot unplugged, inactivity > PCI core will try to wake up the device as part of > unplug process. Si

Re: [PATCH] drm/amdgpu: Fix inconsistent indenting

2021-05-21 Thread Alex Deucher
Applied. Thanks! Alex On Fri, May 21, 2021 at 9:35 AM Christian König wrote: > > Am 21.05.21 um 11:50 schrieb Jiapeng Chong: > > Eliminate the follow smatch warning: > > > > drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:449 > > sdma_v5_0_ring_emit_mem_sync() warn: inconsistent indenting. > > > > Repor

Re: [PATCH] drm/amd/amdkfd: Drop unnecessary NULL check after container_of

2021-05-21 Thread Alex Deucher
Applied. Thanks! Alex On Fri, May 21, 2021 at 11:02 AM Guenter Roeck wrote: > > The first parameter passed to container_of() is the pointer to the work > structure passed to the worker and never NULL. The NULL check on the > result of container_of() is therefore unnecessary and misleading. > Re

[PATCH 1/3] drm/amdgpu: Don't query CE and UE errors

2021-05-21 Thread Luben Tuikov
On QUERY2 IOCTL don't query counts of correctable and uncorrectable errors, since when RAS is enabled and supported on Vega20 server boards, this takes insurmountably long time, in O(n^3), which slows the system down to the point of it being unusable when we have GUI up. Fixes: ae363a212b14 ("drm/

[PATCH 3/3] drm/amdgpu: Use delayed work to collect RAS error counters

2021-05-21 Thread Luben Tuikov
On Context Query2 IOCTL return the correctable and uncorrectable errors in O(1) fashion, from cached values, and schedule a delayed work function to calculate and cache them for the next such IOCTL. Cc: Alexander Deucher Cc: Christian König Cc: John Clements Cc: Hawking Zhang Signed-off-by: Lu

[PATCH 2/3] drm/amdgpu: Fix RAS function interface

2021-05-21 Thread Luben Tuikov
The correctable and uncorrectable errors are calculated at each invocation of this function. Therefore, it is highly inefficient to return just one of them based on a Boolean input. If the caller wants both, twice the work would be done. (And this work is O(n^3) on Vega20.) Fix this "interface" to

RE: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Sider, Graham
Would this be referring to tools that may parse /sys/class/.../device/gpu_metrics or the actual gpu_metrics_vX_Y structs? For the latter, if there are tools that parse dependent on version vX_Y, I agree that we would not want to break those. Since most ASICs are using different version currentl

Re: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 5:32 PM Sider, Graham wrote: > > Would this be referring to tools that may parse > /sys/class/.../device/gpu_metrics or the actual gpu_metrics_vX_Y structs? For > the latter, if there are tools that parse dependent on version vX_Y, I agree > that we would not want to bre

Re: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Alex Deucher
On Fri, May 21, 2021 at 5:47 PM Alex Deucher wrote: > > On Fri, May 21, 2021 at 5:32 PM Sider, Graham wrote: > > > > Would this be referring to tools that may parse > > /sys/class/.../device/gpu_metrics or the actual gpu_metrics_vX_Y structs? > > For the latter, if there are tools that parse de

RE: [PATCH 2/6] drm/amd/pm: Add arcturus throttler translation

2021-05-21 Thread Sider, Graham
Right, that all makes sense. I'm fine with either of these options. Thanks for the insights -- I'll give this a bit more thought and get back to you. Best, Graham -Original Message- From: Alex Deucher Sent: Friday, May 21, 2021 5:50 PM To: Sider, Graham Cc: amd-gfx list ; Kasiviswanat

Re: [PATCH] drm/amd/amdkfd: Drop unnecessary NULL check after container_of

2021-05-21 Thread Felix Kuehling
Am 2021-05-21 um 11:02 a.m. schrieb Guenter Roeck: > The first parameter passed to container_of() is the pointer to the work > structure passed to the worker and never NULL. The NULL check on the > result of container_of() is therefore unnecessary and misleading. > Remove it. > > This change was ma

回复: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify

2021-05-21 Thread Pan, Xinhui
[AMD Official Use Only] Oh, sorry for that. I notice the lockdep warning too. I just think we use trylock elsewhere because we hold the lru_lock mostly. So I think we can do something like below. Let me verify it later. @@ -318,7 +318,9 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgp

[PATCH] drm/amdgpu: Fix a BUG_ON due to resv trylock fails

2021-05-21 Thread xinhui pan
The reservation object might be locked again by evict/swap after individualized. The race is like below. cpu 0 cpu 1 BO release BO evict or swap lock lru_lock ttm_bo_individualize_resv {resv = &_r

Re: [PATCH] drm/amdgpu: Fix a BUG_ON due to resv trylock fails

2021-05-21 Thread Felix Kuehling
When the BO gets individualized, there is an assumption that nobody is accessing it any more. See this comment in ttm_bo_individualize_resv: /* This works because the BO is about to be destroyed and nobody * reference it any more. The only tricky case is the tryloc