[PATCH 8/8] amdgpu/pm: Optimize emit_clock_levels for aldebaran - part 3

2023-04-26 Thread Darren Powell
split switch statement into two and consolidate the common code for printing most of the types of clock speeds Signed-off-by: Darren Powell --- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 82 ++- 1 file changed, 27 insertions(+), 55 deletions(-) diff --git a/drivers/gp

[PATCH 6/8] amdgpu/pm: Optimize emit_clock_levels for aldebaran - part 1

2023-04-26 Thread Darren Powell
Use variables to remove the multiple nested ternary expressions and improve readability. This will help to optimize the code duplication in the switch statement Also Changed: Modify function aldebaran_get_clk_table to void function as it always returns 0 Use const string "attempt_string

[PATCH 5/8] amdgpu/pm: Replace print_clock_levels with emit_clock_levels for aldebaran

2023-04-26 Thread Darren Powell
Replace print_clock_levels with emit_clock_levels for aldebaran * replace .print_clk_levels with .emit_clk_levels in aldebaran_ppt_funcs * added extra parameter int *offset * removed var size, uses arg *offset instead * removed call to smu_cmn_get_sysfs_buf * errors are returned to caller

[PATCH 7/8] amdgpu/pm: Optimize emit_clock_levels for aldebaran - part 2

2023-04-26 Thread Darren Powell
Use variables to remove ternary expression in print statement and improve readability. This will help to optimize the code duplication in the switch statement Also Changed: replaced single_dpm_table->count as iterator in for loops with safer clocks_num_levels value replaced dpm_table.va

[PATCH 3/8] amdgpu/pm: Optimize emit_clock_levels for arcturus - part 2

2023-04-26 Thread Darren Powell
Use variables to remove ternary expression in print statement and improve readability. This will help to optimize the code duplication in the switch statement Also Changed: replaced single_dpm_table->count as iterator in for loops with safer clocks_num_levels value replaced dpm_table.va

[PATCH 4/8] amdgpu/pm: Optimize emit_clock_levels for arcturus - part 3

2023-04-26 Thread Darren Powell
split switch statement into two and consolidate the common code for printing most of the types of clock speeds Signed-off-by: Darren Powell --- .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 77 ++- 1 file changed, 24 insertions(+), 53 deletions(-) diff --git a/drivers/

[PATCH 2/8] amdgpu/pm: Optimize emit_clock_levels for arcturus - part 1

2023-04-26 Thread Darren Powell
Use variables to remove the multiple nested ternary expressions and improve readability. This will help to optimize the code duplication in the switch statement Also Changed: Modify function arcturus_get_clk_table to void function as it always returns 0 Use const string "attempt_string"

[PATCH 1/8] amdgpu/pm: Replace print_clock_levels with emit_clock_levels for arcturus

2023-04-26 Thread Darren Powell
Replace print_clock_levels with emit_clock_levels for arcturus * replace .print_clk_levels with .emit_clk_levels in arcturus_ppt_funcs * added extra parameter int *offset * removed var size, uses arg *offset instead * removed call to smu_cmn_get_sysfs_buf * errors are returned to caller

[PATCH 0/8] amdgpu/pm: Implement emit_clock_levels for arcturus, aldebaran

2023-04-26 Thread Darren Powell
amdgpu/pm: Implement emit_clock_levels for arcturus,aldebaran == Description == Scnprintf use within the kernel is not recommended, but simple sysfs_emit replacement has not been successful due to the page alignment requirement of the function. This patch set implements a new api "emit_clock_l

[PATCH] drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV

2023-04-26 Thread Horace Chen
[Why] This WPTR_POLL_ENABLE is a hardware contigious polling which will cause FCLK and UCLK to keep on a high level. Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by firmware. So to save power, SR-IOV also needs to disable this bit Signed-off-by: Horace Chen --- drivers/gpu/

[linux-next:master] BUILD SUCCESS WITH WARNING b7455b10da762f2d447678c88e37cc1eb6cb45ee

2023-04-26 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: b7455b10da762f2d447678c88e37cc1eb6cb45ee Add linux-next specific files for 20230426 Warning reports: https://lore.kernel.org/oe-kbuild-all/202304210303.nlmi0srq-...@intel.com https

[pull] amdgpu drm-fixes-6.4

2023-04-26 Thread Alex Deucher
Hi Dave, Daniel, Fixes for 6.4. A bit bigger than usual since it's two weeks worth. Mostly display fixes. The following changes since commit e82c98f2ca439356d5595ba8c9cd782f993f6f8c: Merge tag 'amd-drm-next-6.4-2023-04-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next (2023-04

[PATCH] drm/amdgpu: Recover vram from vmbo->shadow rather than vmbo->bo

2023-04-26 Thread Lin . Cao
Vmbo->shadow is used to back vram bo up when vram lost. So that we should set shadow as vmbo->shadow to recover vmbo->bo. Fix: 'commit e18aaea733da ("drm/amdgpu: move shadow_list to amdgpu_bo_vm")' Signed-off-by: Lin.Cao --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++- 1 file change

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-26 Thread Felix Kuehling
Hi Eric, Can you try if the attached patch fixes the problem without breaking the eviction tests on a multi-GPU PCIe P2P system? Thanks,   Felix On 2023-04-26 13:02, Christian König wrote: Am 26.04.23 um 18:58 schrieb Felix Kuehling: On 2023-04-26 9:03, Christian König wrote: Am 25.04.23

[PATCH] drm/amdgpu: drop redudant sched job cleanup when cs is aborted

2023-04-26 Thread Guchun Chen
Once command submission failed due to userptr invalidation in amdgpu_cs_submit, legacy code will perform cleanup of scheduler job. However, it's not needed at all, as f7d66fb2ea43 has integrated job cleanup stuff into amdgpu_job_free. Otherwise, because of double free, a NULL pointer dereference wi

[PATCH] drm/amdgpu: mark gfx_v9_4_3_disable_gpa_mode() static

2023-04-26 Thread Guchun Chen
This was left global by accident, the corresponding functions for other hardware types are already static: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1072:6: error: no previous prototype for function 'gfx_v9_4_3_disable_gpa_mode' [-Werror,-Wmissing-prototypes] Fixes: 86301129698b ("drm/amdgpu: spl

Re: [PATCH v2] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-26 Thread Chia-I Wu
On Wed, Apr 26, 2023 at 4:05 AM Christian König wrote: > > Am 26.04.23 um 08:17 schrieb Chia-I Wu: > > mgr->ctx_handles should be protected by mgr->lock. > > > > v2: improve commit message > > > > Signed-off-by: Chia-I Wu > > Cc: sta...@vger.kernel.org > > Please don't manually CC sta...@vger.ker

[PATCH v3] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-26 Thread Chia-I Wu
mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message v3: add a Fixes tag Signed-off-by: Chia-I Wu Reviewed-by: Christian König Fixes: 52c6a62c64fac ("drm/amdgpu: add interface for editing a foreign process's priority v3") --- drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c

[PATCH 12/12] drm/amdgpu: put MQDs in VRAM

2023-04-26 Thread Alex Deucher
Reduces preemption latency. v2: move MES MQDs into VRAM as well (YuBiao) v3: enable on gfx10, 11 only (Alex) Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c | 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 + 3 files ch

[PATCH 11/12] drm/amdgpu/gfx11: always restore kcq/kgq MQDs

2023-04-26 Thread Alex Deucher
Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 29 +- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgp

[PATCH 10/12] drm/amdgpu/gfx10: always restore kcq/kgq MQDs

2023-04-26 Thread Alex Deucher
Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 29 +- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgp

[PATCH 07/12] drm/amdgpu/gfx11: drop unused variable

2023-04-26 Thread Alex Deucher
Just check the return value directly. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index d36d365cb582..256014a8c824 10

[PATCH 09/12] drm/amdgpu/gfx9: always restore kcq MQDs

2023-04-26 Thread Alex Deucher
Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 +++--- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 14 +++--- 2 files changed, 14 insertions(+), 14 d

[PATCH 06/12] drm/amdgpu/gfx10: drop unused variable

2023-04-26 Thread Alex Deucher
Just check the return value directly. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 24d7134228b0..5c67c91c4297 10

[PATCH 03/12] drm/amdgpu: add [en/dis]able_kgq() functions

2023-04-26 Thread Alex Deucher
To replace the IP specific variants which are largely duplicate. Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 68 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 + 2 files changed, 70 insertions(+) diff --git a/dri

[PATCH 05/12] drm/amdgpu/gfx11: use generic [en/dis]able_kgq() helpers

2023-04-26 Thread Alex Deucher
And remove the duplicate local variants. Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 49 ++ 1 file changed, 2 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/am

[PATCH 08/12] drm/amdgpu/gfx8: always restore kcq MQDs

2023-04-26 Thread Alex Deucher
Always restore the MQD not just when we do a reset. This allows us to move the MQD to VRAM if we want. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/d

[PATCH 02/12] drm/amdgpu/gfx10: drop old bring up code

2023-04-26 Thread Alex Deucher
No longer used. Remove it. Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 70 ++ 1 file changed, 3 insertions(+), 67 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_

[PATCH 04/12] drm/amdgpu/gfx10: use generic [en/dis]able_kgq() helpers

2023-04-26 Thread Alex Deucher
And remove the duplicate local variants. Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 48 ++ 1 file changed, 2 insertions(+), 46 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/am

[PATCH 01/12] drm/amdgpu/gfx11: drop old bring up code

2023-04-26 Thread Alex Deucher
No longer used. Remove it. Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 70 ++ 1 file changed, 3 insertions(+), 67 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_

Re: [PATCH next] drm/amd/display: Fix possible NULL dereference in dc_dmub_srv_cmd_run_list()

2023-04-26 Thread Hamza Mahfooz
On 4/26/23 15:24, Harshit Mogalapalli wrote: We have a NULL check for 'dc_dmub_srv' in dc_dmub_srv_cmd_run_list() but we are dereferencing it before checking. Fix this moving the dereference next to NULL check. This issue is found with Smatch(static analysis tool). Fixes: e97cc04fe0fb ("drm/am

Re: [PATCH] drm/amd/display: return status of dmub_srv_get_fw_boot_status

2023-04-26 Thread Hamza Mahfooz
On 4/20/23 09:59, Tom Rix wrote: gcc with W=1 reports drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c: In function ‘dc_dmub_srv_optimized_init_done’: drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:184:26: error: variable ‘dmub’ set but not used [-Werror=unused-but-set-variabl

[PATCH next] drm/amd/display: Fix possible NULL dereference in dc_dmub_srv_cmd_run_list()

2023-04-26 Thread Harshit Mogalapalli
We have a NULL check for 'dc_dmub_srv' in dc_dmub_srv_cmd_run_list() but we are dereferencing it before checking. Fix this moving the dereference next to NULL check. This issue is found with Smatch(static analysis tool). Fixes: e97cc04fe0fb ("drm/amd/display: refactor dmub commands into single

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-26 Thread Christian König
Am 26.04.23 um 18:58 schrieb Felix Kuehling: On 2023-04-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehlin

Re: [PATCH] drm/amd/display: Correctly initialize some memory in get_available_dsc_slices()

2023-04-26 Thread Hamza Mahfooz
On 4/25/23 03:53, Christophe JAILLET wrote: The intent here is to clear the 'available_slices' buffer before setting some values in it. This is an array of int, so in order to fully initialize it, we must clear MIN_AVAILABLE_SLICES_SIZE * sizeof(int) bytes. Compute the right length of the buffe

Re: [PATCH] drm/amd/display: set variable custom_backlight_curve0 storage-class-specifier to static

2023-04-26 Thread Hamza Mahfooz
On 4/26/23 07:18, Tom Rix wrote: smatch reports drivers/gpu/drm/amd/amdgpu/../display/modules/power/power_helpers.c:119:31: warning: symbol 'custom_backlight_curve0' was not declared. Should it be static? This variable is only used in its defining file, so it should be static Signed-off-by:

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-26 Thread Felix Kuehling
On 2023-04-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Chri

[PATCH 8/8] drm/amd/display: 3.2.234

2023-04-26 Thread Alan Liu
From: Aric Cyr This version brings along following fixes: - FW Release 0.0.165.0 - Add w/a to disable DP dual mode on certain ports - Revert "Update scaler recout data for visual confirm" - Filter out invalid bits in pipe_fuses - Adding debug option to override Z8 watermark values - Change defaul

[PATCH 7/8] drm/amd/display: [FW Promotion] Release 0.0.165.0

2023-04-26 Thread Alan Liu
From: Anthony Koo - Add dmub boot options to disable ips states on init Acked-by: Alan Liu Signed-off-by: Anthony Koo --- drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd

[PATCH 6/8] drm/amd/display: Add w/a to disable DP dual mode on certain ports

2023-04-26 Thread Alan Liu
From: George Shen [Why] Certain ports on DCN3.2 configs do not properly populate the BIOS info table flag to indicate DP dual mode is unsupported. [How] Add a workaround to disable DP dual mode on the ports with the missing BIOS info table flag. Reviewed-by: Michael Strauss Acked-by: Alan Liu

[PATCH 5/8] drm/amd/display: revert "Update scaler recout data for visual confirm"

2023-04-26 Thread Alan Liu
From: Leo Ma This reverts commit 8552024d1e2a008b6df544845d09120cfea9508b. A regression is found on this change, so revert it for the time being and resubmit when issue is fixed. Reviewed-by: Martin Leung Acked-by: Alan Liu Signed-off-by: Leo Ma --- .../gpu/drm/amd/display/dc/core/dc_resour

[PATCH 4/8] drm/amd/display: filter out invalid bits in pipe_fuses

2023-04-26 Thread Alan Liu
From: Samson Tam [Why] Reading pipe_fuses from register may have invalid bits set, which may affect the num_pipes erroneously. [How] Add read_pipes_fuses() call and filter bits based on expected number of pipes. Reviewed-by: Alvin Lee Acked-by: Alan Liu Signed-off-by: Samson Tam --- drive

[PATCH 3/8] drm/amd/display: Adding debug option to override Z8 watermark values

2023-04-26 Thread Alan Liu
From: Leo Chen [Why & How] Adding debug options to override Z8 watermark values for testing purposes. Reviewed-by: Nicholas Kazlauskas Acked-by: Alan Liu Signed-off-by: Leo Chen --- drivers/gpu/drm/amd/display/dc/dc.h | 4 drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20

[PATCH 2/8] drm/amd/display: Change default Z8 watermark values

2023-04-26 Thread Alan Liu
From: Leo Chen [Why & How] Previous Z8 watermark values were causing flickering and OTC underflow. Updating Z8 watermark values based on the measurement. Reviewed-by: Nicholas Kazlauskas Cc: Mario Limonciello Cc: Alex Deucher Cc: sta...@vger.kernel.org Acked-by: Alan Liu Signed-off-by: Leo C

[PATCH 1/8] drm/amd/display: Workaround wrong HDR colorimetry with some receivers

2023-04-26 Thread Alan Liu
From: Ilya Bakoulin [Why] Some scalers do not pick up color space updates unless the DP link is disabled/re-enabled which can result in incorrect/washed out HDR colors in some cases. [How] Call set_dpms_on to disable the link, re-train and re-enable with the updated output color space. Reviewed

[PATCH 0/8] DC Patches April 26, 2023

2023-04-26 Thread Alan Liu
This DC patchset brings improvements in multiple areas. In summary, we highlight: - FW Release 0.0.165.0 - Add w/a to disable DP dual mode on certain ports - Revert "Update scaler recout data for visual confirm" - Filter out invalid bits in pipe_fuses - Adding debug option to override Z8 watermark

[PATCHv2 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit

2023-04-26 Thread Mukul Joshi
Use the helper function in TTM to get TTM mem limit and set GTT size to be equal to TTL mem limit. Signed-off-by: Mukul Joshi Reviewed-by: Christian König --- v1->v2: - Remove AMDGPU_DEFAULT_GTT_SIZE_MB as well as it is unused. drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - drivers/gpu/drm/

[PATCH] drm/amdgpu: improve wait logic at fence polling

2023-04-26 Thread Alex Sierra
Accomplish this by reading the seq number right away instead of sleep for 5us. There are certain cases where the fence is ready almost immediately. Sleep number granularity was also reduced as the majority of the kiq tlb flush takes between 2us to 6us. Signed-off-by: Alex Sierra --- drivers/gpu/

RE: [PATCH 2/3] drm/amdgpu: Set GTT size equal to TTM mem limit

2023-04-26 Thread Joshi, Mukul
[AMD Official Use Only - General] > -Original Message- > From: Chen, Guchun > Sent: Wednesday, April 26, 2023 2:00 AM > To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org; > dri-de...@lists.freedesktop.org > Cc: Joshi, Mukul ; Kuehling, Felix > ; Koenig, Christian > Subject: RE: [PATCH 2

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-26 Thread Marek Olšák
Perhaps I should clarify this. There are GL and Vulkan features that if any app uses them and its shaders are killed, the next IB will hang. One of them is Draw Indirect - if a shader is killed before storing the vertex count and instance count in memory, the next draw will hang with a high probabi

Keyword Review - Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-26 Thread Christian König
WTF? I own you a beer! I've fixed exactly that problem during the review process of the cleanup patch and because of this didn't considered that the code is still there. It also explains why we don't see that in our testing. @Mikhail can you test that patch with drm-misc-next? Thanks, Christ

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-26 Thread Christian König
Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Christian König: Am 13.04.23 um 03:01 schrieb Fel

Re: [PATCH] powerpc/64: Always build with 128-bit long double

2023-04-26 Thread Michael Ellerman
On Tue, 04 Apr 2023 20:28:47 +1000, Michael Ellerman wrote: > The amdgpu driver builds some of its code with hard-float enabled, > whereas the rest of the kernel is built with soft-float. > > When building with 64-bit long double, if soft-float and hard-float > objects are linked together, the bui

[PATCH] drm/amd/display: set variable custom_backlight_curve0 storage-class-specifier to static

2023-04-26 Thread Tom Rix
smatch reports drivers/gpu/drm/amd/amdgpu/../display/modules/power/power_helpers.c:119:31: warning: symbol 'custom_backlight_curve0' was not declared. Should it be static? This variable is only used in its defining file, so it should be static Signed-off-by: Tom Rix --- drivers/gpu/drm/amd/di

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-26 Thread Christian König
Sending that once more from my mailing list address since AMD internal servers are blocking the mail. Regards, Christian. Am 26.04.23 um 13:48 schrieb Christian König: WTF? I own you a beer! I've fixed exactly that problem during the review process of the cleanup patch and because of this di

Re: [PATCH] drm/amdgpu: fix a build warning by a typo in amdgpu_gfx.c

2023-04-26 Thread Christian König
Am 26.04.23 um 05:11 schrieb Guchun Chen: This should be a typo when intruducing multi-xx support. Reported-by: kernel test robot Signed-off-by: Guchun Chen Cc: Le Ma Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 del

Re: [PATCH v2] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-26 Thread Christian König
Am 26.04.23 um 08:17 schrieb Chia-I Wu: mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message Signed-off-by: Chia-I Wu Cc: sta...@vger.kernel.org Please don't manually CC sta...@vger.kernel.org while sending patches out, let us maintainers push that upstream with the

RE: [PATCH v3] drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs

2023-04-26 Thread Zhang, Hawking
[AMD Official Use Only - General] amdgpu_gfx_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *r if (r) return r; - r = amdgpu_irq_get(adev, &adev->gfx.cp_ecc_error_irq, 0); - if (r) - goto lat

Re: [PATCH 1/3] drm/ttm: Helper function to get TTM mem limit

2023-04-26 Thread Christian König
Am 26.04.23 um 03:52 schrieb Mukul Joshi: Add a helper function to get TTM memory limit. This is needed by KFD to set its own internal memory limits. Signed-off-by: Mukul Joshi Reviewed-by: Christian König for the series. --- drivers/gpu/drm/ttm/ttm_tt.c | 6 ++ include/drm/ttm/ttm_

Re: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-26 Thread Christian König
Am 26.04.23 um 02:48 schrieb Chia-I Wu: Good catch, but you need some commit message here. Something like "Need to hold the lock while iterating the idr to make sure no context is destroyed." should be sufficient. Apart from that looks good to me. Regards, Christian. Signed-off-by: Chia-I

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-26 Thread Michel Dänzer
On 4/25/23 21:11, Marek Olšák wrote: > The last 3 comments in this thread contain arguments that are false and were > specifically pointed out as false 6 comments ago: Soft resets are just as > fatal as hard resets. There is nothing better about soft resets. If the VRAM > is lost completely, tha

RE: [PATCH v3] drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs

2023-04-26 Thread Xu, Feifei
[AMD Official Use Only - General] Reviewed-by: Feifei Xu -Original Message- From: Horatio Zhang Sent: Wednesday, April 26, 2023 4:41 PM To: Zhang, Hawking ; Koenig, Christian ; Chen, Guchun ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Yao, Longlong ; Zhang, Horatio ; Zhang, Ha

[PATCH v3] drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs

2023-04-26 Thread Horatio Zhang
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0

Re: [PATCH] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-26 Thread Greg KH
On Tue, Apr 25, 2023 at 05:48:27PM -0700, Chia-I Wu wrote: > Signed-off-by: Chia-I Wu > Cc: sta...@vger.kernel.org I know I can not take patches without any changelog text at all, maybe the DRM developers are more lax, but it's not a good idea at all. thanks, greg k-h

Re: [PATCH v2] drm/amdgpu: add a missing lock for AMDGPU_SCHED

2023-04-26 Thread Greg KH
On Tue, Apr 25, 2023 at 11:17:14PM -0700, Chia-I Wu wrote: > mgr->ctx_handles should be protected by mgr->lock. > > v2: improve commit message > > Signed-off-by: Chia-I Wu > Cc: sta...@vger.kernel.org What commit id does this fix? How far back in stable kernels should this go? thanks, greg k