Re: [PATCH] drm/radeon: remove load callback

2024-06-07 Thread Christian König
schrieb Hoi Pok Wu: i do it because it is part of the todo list where the task is to remove load/unload callback there are only 2 drm_driver that still uses thats why i thought my amdgpu could test radeonsi but no, i still send it anyway regards, wu On Fri, Jun 7, 2024 at 3:51 AM Christian

Re: [PATCH] drm/ttm: Add cgroup memory accounting for GTT memory

2024-06-07 Thread Christian König
Am 07.06.24 um 16:43 schrieb Joshi, Mukul: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Koenig, Christian Sent: Friday, June 7, 2024 3:26 AM To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix ; Bhardwaj, Rajneesh ; Yang, Philip

Re: [PATCH 2/2] drm/amdgpu: fix dereferencing null pointer warning in ring_test_ib()

2024-06-07 Thread Christian König
Am 07.06.24 um 10:33 schrieb Bob Zhou: To avoid null pointer dereference, Check return value and conduct error handling. That doesn't make much sense. At this point the amdgpu_mes_ctx_get_offs_cpu_addr() shouldn't be able to return NULL in the first place. Regards, Christian.

Re: [PATCH 1/2] drm/amdgpu: fix overflowed constant warning in mmhub_set_clockgating()

2024-06-07 Thread Christian König
Am 07.06.24 um 10:33 schrieb Bob Zhou: To fix potential overflowed constant warning, modify the variables to u32 for getting the return value of RREG32_SOC15(). Signed-off-by: Bob Zhou Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 2 +- drivers/gpu/drm/amd

Re: [PATCH v2] drm/amdgpu: Fix the BO release clear memory warning

2024-06-07 Thread Christian König
nto drm-misc-fixes to be sure the patch makes it into 6.10. Feel free to add Reviewed-by: Christian König . Regards, Christian. Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König ---   driv

Re: [PATCH] drm/radeon: remove load callback

2024-06-07 Thread Christian König
Am 07.06.24 um 03:14 schrieb wu hoi pok: this patch is to remove the load callback from the kms_driver, following closly to amdgpu, radeon_driver_load_kms and devm_drm_dev_alloc are used, most of the changes here are rdev->ddev to rdev_to_drm, which maps to adev_to_drm in amdgpu. however this

Re: [PATCH] drm/ttm: Add cgroup memory accounting for GTT memory

2024-06-07 Thread Christian König
Am 06.06.24 um 21:22 schrieb Mukul Joshi: Make sure we do not overflow the memory limits set for a cgroup when doing GTT memory allocations. NAK, That's intentionally not done like that. Please see the cgroup discussion on memory management on the public mailing list. Regards, Christian.

Re: [PATCH] drm/amdgpu: Move SR-IOV check into amdgpu_gfx_sysfs_compute_init

2024-06-07 Thread Christian König
. See the partitioning mode is something which is fundamentally incompatible with SRIOV. So this is not IP version specific at all. Regards, Christian. Thanks, Lijo Cc: Alex Deucher Cc: Christian König Suggested-by: Christian König Signed-off-by: Srinivasan Shanmugam --- drivers/gpu/drm

[PATCH] drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2

2024-06-06 Thread Christian König
make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König Reviewed-by: Alex Deucher CC: sta...@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c

Re: [PATCH] drm/amdgpu: Move SR-IOV check into amdgpu_gfx_sysfs_compute_init

2024-06-06 Thread Christian König
the code organization and maintainability. If in the future the conditions for creating the compute partition sysfs entries change, we would only need to update the amdgpu_gfx_sysfs_compute_init function. Cc: Alex Deucher Cc: Christian König Suggested-by: Christian König Signed-off-by: Srinivasan

[PATCH 3/3] drm/amdgpu: nuke the VM PD/PT shadow handling

2024-06-06 Thread Christian König
not recoverable in any way when VRAM is lost. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 87 + drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 67 +--- drivers/gpu/drm/amd/amdgpu

[PATCH 2/3] drm/amdgpu: remove amdgpu_pin_restricted()

2024-06-06 Thread Christian König
We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 56 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 - 2 files changed, 5

[PATCH 1/3] drm/amdgpu: explicitely set the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag

2024-06-06 Thread Christian König
Instead of having that in the amdgpu_bo_pin() function applied for all pinned BOs. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c| 1 + drivers/gpu/drm/amd

Re: [PATCH 01/18] drm/amdgpu: enhance amdgpu_ucode_request() function flexibility

2024-06-06 Thread Christian König
Am 03.06.24 um 03:41 schrieb Yang Wang: Adding formatting string feature to improve function flexibility. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 30 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 3 ++- 2 files changed, 24

Re: [PATCH 2/2] amdgpu: don't dereference a NULL resource in sysfs code

2024-06-06 Thread Christian König
ing. With that done the patch is Reviewed-by: Christian König Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 63 +++--- 1 file changed, 33 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amd

Re: [PATCH] drm/amdgpu: revert "take runtime pm reference when we attach a buffer"

2024-06-06 Thread Christian König
Am 05.06.24 um 15:20 schrieb Alex Deucher: On Wed, Jun 5, 2024 at 8:32 AM Christian König wrote: This reverts commit b8c415e3bf989be1b749409951debe6b36f5c78c and commit 425285d39afddaf4a9dab36045b816af0cc3e400. Taking a runtime pm reference for DMA-buf is actually completely unnecessary

[PATCH] drm/amdgpu: revert "take runtime pm reference when we attach a buffer"

2024-06-05 Thread Christian König
it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu

Re: [PATCH 00/18] Enhance amdgpu_firmware_request() to improve function flexibility

2024-06-05 Thread Christian König
You haven't addressed any of my comments on patch #1. Regards, Christian. Am 05.06.24 um 11:33 schrieb Wang, Yang(Kevin): [AMD Official Use Only - AMD Internal Distribution Only] Ping... Best Regards, Kevin -Original Message- From: amd-gfx On Behalf Of Yang Wang Sent: Monday, June

Page fault storms and IH ring overflow

2024-06-05 Thread Christian König
Hi guys, just FYI: Alex published yesterday a bunch of new firmware files: https://gitlab.freedesktop.org/drm/firmware/-/commits/amd-staging One major issue which should be fixed by those is that page faults can no longer overflow the IH ring buffer on APUs and older dGPUs. Newer dGPU with

Re: [PATCH 1/2][RFC] amdgpu: fix a race in kfd_mem_export_dmabuf()

2024-06-05 Thread Christian König
Am 04.06.24 um 20:08 schrieb Felix Kuehling: On 2024-06-03 22:13, Al Viro wrote: Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy;

Re: [PATCH] drm/amdgpu: add reset source in various cases

2024-06-05 Thread Christian König
Am 04.06.24 um 17:58 schrieb Eric Huang: To fullfill the reset event description. Suggested-by: Lijo Lazar Signed-off-by: Eric Huang Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 + drivers/gpu/drm/amd

Re: [PATCH] drm/amd/display: use pre-allocated temp structure for bounding box

2024-06-05 Thread Christian König
ng...@amd.com Cc: rodrigo.sique...@amd.com Acked-by: Christian König --- drivers/gpu/drm/amd/display/dc/dc.h | 1 + .../drm/amd/display/dc/resource/dcn32/dcn32_resource.c| 8 +++- .../drm/amd/display/dc/resource/dcn321/dcn321_resource.c | 8 +++- 3 fil

Re: [PATCH] drm/amd/display: use GFP_ATOMIC for bounding box

2024-06-05 Thread Christian König
Am 04.06.24 um 16:57 schrieb Arnd Bergmann: On Tue, Jun 4, 2024, at 16:22, Christian König wrote: Am 04.06.24 um 15:50 schrieb Alex Deucher: This can be called in atomic context. Should fix: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306 in_atomic(): 1

Re: [PATCH v4 6/9] drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable

2024-06-05 Thread Christian König
investigation. With that done the patch is Reviewed-by: Christian König Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 4 +--- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) diff

[PATCH 5/6] drm/amdgpu: always enable move threshold for BOs

2024-06-04 Thread Christian König
This should prevent buffer moves when the threshold is reached during CS. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 36 -- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 + 2 files changed, 29 insertions(+), 29 deletions

[PATCH 2/6] drm/ttm: add TTM_PL_FLAG_TRESHOLD

2024-06-04 Thread Christian König
This adds support to enable a placement only when a certain treshold of moved bytes is reached. It's a context flag which will be handled together with TTM_PL_FLAG_DESIRED and TTM_PL_FLAG_FALLBACK. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 ++--- drivers/gpu/drm

[PATCH 6/6] drm/amdgpu: Re-validate evicted buffers v2

2024-06-04 Thread Christian König
something is in it's preferred placement or not and also disable the handling on APUs. Signed-off-by: Tvrtko Ursulin Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/

[PATCH 4/6] drm/amdgpu: re-order AMDGPU_GEM_DOMAIN_DOORBELL handling

2024-06-04 Thread Christian König
That should probably come last. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index

[PATCH 3/6] drm/amdgpu: enable GTT fallback handling for dGPUs only

2024-06-04 Thread Christian König
That is just a waste of time on APUs. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8d8c39be6129

[PATCH 1/6] drm/amdgpu: cleanup MES command submission

2024-06-04 Thread Christian König
compile tested. While at it cleanup the coding style. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 76 -- 1 file changed, 48 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu

Rate limit improvements for TTM

2024-06-04 Thread Christian König
Hi guys, as already discussed on the mailing list Tvrtko and Friedrich stumbled over a bunch of problems with the memory management. Especially that move rate limit didn't seemed to work for VRAM|GTT BOs and causing bunch of additional and unecessary overhead during CS. This (not well tested)

Re: [PATCH] drm/amd/display: use GFP_ATOMIC for bounding box

2024-06-04 Thread Christian König
Am 04.06.24 um 15:50 schrieb Alex Deucher: This can be called in atomic context. Should fix: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 449, name: kworker/u64:8 preempt_count: 2, expected: 0 RCU

Re: [PATCH v2] drm/amdgpu: fix the overflowed constant warning for RREG32_SOC15()

2024-06-04 Thread Christian König
Am 04.06.24 um 09:08 schrieb Bob Zhou: To fix potential overflowed constant warning reported by Coverity, modify the variables to uint32_t. Signed-off-by: Bob Zhou Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 7 --- 1 file changed, 4 insertions(+), 3

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-06-04 Thread Christian König
To: Liu, Shaoyun ; Christian König ; Li, Yunxiang (Teddy) ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Xiao, Hua Subject: Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started Hi Shaoyun, yes my thinking goes into the same direction. The basic problem here is that we

Re: [PATCH] drm/amdgpu: use local xcc write to flush tlb

2024-06-03 Thread Christian König
Am 03.06.24 um 13:46 schrieb Yiqing Yao: When flushing gpu tlb using kiq from gfxhub, kiq ring is always local as xcc instance is selected for it. Thus using lower 18 bits to access mmregs inside local xcc instead of full address used when accessing regs outside of local xcc. Remove redundent

Re: [PATCH 1/2] amdgpu: add the amdgpu_vm ptr in the vm_bo_map/unmap events

2024-06-03 Thread Christian König
Am 03.06.24 um 13:52 schrieb Pierre-Eric Pelloux-Prayer: Hi Christia, Le 03/06/2024 à 11:58, Christian König a écrit : Am 03.06.24 um 10:46 schrieb Pierre-Eric Pelloux-Prayer: These 2 traces events are tied to a specific VM so in order for them to be useful for a tool we need to trace

Re: [PATCH] drm/amdgpu: replace int with unsigned int for imu_v12_0.c

2024-06-03 Thread Christian König
Am 03.06.24 um 10:53 schrieb Zhou, Bob: [AMD Official Use Only - AMD Internal Distribution Only] Hi Christian, It fixes a potential Overflowed constant (INTEGER_OVERFLOW) warning reported by Coverity. You need to mention that in the commit message. And I haven't checked the hardware docs,

Re: [PATCH 01/18] drm/amdgpu: enhance amdgpu_ucode_request() function flexibility

2024-06-03 Thread Christian König
Am 31.05.24 um 08:52 schrieb Yang Wang: Adding formatting string feature to improve function flexibility. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 30 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 3 ++- 2 files changed, 24

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-06-03 Thread Christian König
with the seq number . In this case driver can get the failure of the submission to MES in time and make its own decision for what to do next , What do you think about this ? Regards Shaoyun.liu -Original Message- From: amd-gfx On Behalf Of Christian König Sent: Wednesday, May 29

Re: [PATCH 1/2] amdgpu: add the amdgpu_vm ptr in the vm_bo_map/unmap events

2024-06-03 Thread Christian König
Am 03.06.24 um 10:46 schrieb Pierre-Eric Pelloux-Prayer: These 2 traces events are tied to a specific VM so in order for them to be useful for a tool we need to trace the amdgpu_vm as well. The bo_va already contains the VM pointer the map/unmap operation belongs to. Signed-off-by:

Re: [PATCH] drm/amdgpu: replace int with unsigned int for imu_v12_0.c

2024-06-03 Thread Christian König
Am 03.06.24 um 07:59 schrieb Bob Zhou: The return value of RREG32_SOC15 is unsigned int, so modify variable to unsigned. And why is that an improvement? Regards, Christian. Signed-off-by: Bob Zhou --- drivers/gpu/drm/amd/amdgpu/imu_v12_0.c | 6 +++--- 1 file changed, 3 insertions(+),

Re: [PATCH] drm/amdgpu: Skip coredump during resets for debug

2024-06-03 Thread Christian König
Am 31.05.24 um 14:34 schrieb Lijo Lazar: Skip scheduling coredump when gpu reset is intentionally triggered through debugfs. Signed-off-by: Lijo Lazar Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers

Re: [RFC v2 0/2] Discussion around eviction improvements

2024-05-31 Thread Christian König
20784781 10.00 37.00 89.6722.0012.33 patched 4227688 13.67 37.00 81.3323.3315.00 Disclaimers that I have is that more runs would be needed to be more confident about the results. And more games. And APU versus discrete. Cc: Christian König Cc: Fried

Re: [PATCH v2 09/10] drm/amdgpu: fix missing reset domain locks

2024-05-31 Thread Christian König
Am 31.05.24 um 00:02 schrieb Felix Kuehling: On 2024-05-28 13:23, Yunxiang Li wrote: These functions are missing the lock for reset domain. Signed-off-by: Yunxiang Li ---   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 4 +++-   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   

Re: [PATCH v3 8/8] drm/amdgpu: fix missing reset domain locks

2024-05-31 Thread Christian König
Am 30.05.24 um 23:48 schrieb Yunxiang Li: These functions are missing the lock for reset domain. Please separate the GART changes from the KFD changes. Apart from that looks good to me. Thanks, Christian. Signed-off-by: Yunxiang Li --- v3: only bracket amdgpu_device_flush_hdp with the

Re: [PATCH 7/8] drm/amdkfd: Comment out the unused variable use_static in pm_map_queues_v9

2024-05-30 Thread Christian König
Am 30.05.24 um 05:50 schrieb Jesse Zhang: To fix the warning about unused value, comment out the variable use_static. Commenting out variables with // will just get you another warning from checkpatch. Christian. Signed-off-by: Jesse Zhang ---

Re: [PATCH 4/8] amd/amdkfd:fix overflowed constant in the function svm_migrate_copy_to_ram

2024-05-30 Thread Christian König
Am 30.05.24 um 05:48 schrieb Jesse Zhang: If the svm migration copy memory gart fails or the dma mapping page fails for the first time. But the variable i is still 0, and executing i-- will overflow. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++- 1 file

Re: [PATCH 1/8] drm/amdgu: fix Unintentional integer overflow for mall size

2024-05-30 Thread Christian König
ybe better change the type of the local variable instead? On the other hand feel free to add Reviewed-by: Christian König to this one as well. Regards, Christian. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 +- 1 file changed, 1 insertion(+), 1 delet

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 16:48 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] Yeah, I know. That's one of the reason I've pointed out on the patch adding that that this behavior is actually completely broken. If you run into issues with the MES because of this

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 16:31 schrieb Li, Yunxiang (Teddy): [Public] The problem is that we don't force complete the non scheduler rings, e.g. MES, KIQ etc... Try to remove this check here from the loop in amdgpu_device_pre_asic_reset(): if (!amdgpu_ring_sched_ready(ring))

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 15:44 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] I don't think trying to add some reset handling here makes sense in the first place. Part of the reset/recovery procedure is to signal all fence and that includes the one we are

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 29.05.24 um 15:22 schrieb Li, Yunxiang (Teddy): [Public] It's perfectly possible that the reset has already started before we enter the function. Yeah, this could and does happen, but it just means we are back to the old behavior. I guess I could use "can I take the read side lock?" to

Re: [PATCH] drm/amdgpu: drop MES 10.1 support

2024-05-29 Thread Christian König
Am 02.05.24 um 23:41 schrieb Alex Deucher: It was an enablement vehicle for MES 11 and was never productized. Remove it. Signed-off-by: Alex Deucher Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/Makefile |1 - drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c

Re: [PATCH 01/14] drm/amdgpu: add nbio set_reg_remap helper

2024-05-29 Thread Christian König
Acked-by: Christian König for the whole series. Am 06.05.24 um 20:45 schrieb Alex Deucher: Will be used to consolidate reg remap settings and fix HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h

Re: [bug report] drm/amdgpu: amdgpu crash on playing videos, linux 6.10-rc

2024-05-29 Thread Christian König
Hi, when the issue is easy to reproduce I suggest to bisect the changes between 6.9 and 6.10-rc1. On the other hand it's not unlikely that we have a known bug in -rc1 which will be fixed by -rc2. Anyway added Leo to the mail thread since he is the one responsible for the video decoding

Re: [PATCH v2 09/10] drm/amdgpu: fix missing reset domain locks

2024-05-29 Thread Christian König
Am 28.05.24 um 19:23 schrieb Yunxiang Li: These functions are missing the lock for reset domain. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 4 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++--

Re: [PATCH v2 08/10] drm/amdgpu: fix locking scope when flushing tlb

2024-05-29 Thread Christian König
Am 28.05.24 um 19:23 schrieb Yunxiang Li: Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li Reviewed-by: Christian

Re: [PATCH v2 07/10] drm/amdgpu: use helper in amdgpu_gart_unbind

2024-05-29 Thread Christian König
Am 28.05.24 um 19:23 schrieb Yunxiang Li: When amdgpu_gart_invalidate_tlb helper is introduced this part was left out of the conversion. Avoid the code duplication here. Signed-off-by: Yunxiang Li Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 5 + 1

Re: [PATCH v2 06/10] drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover

2024-05-29 Thread Christian König
Am 28.05.24 um 19:23 schrieb Yunxiang Li: At this point the gart is not set up, there's no point to invalidate tlb here and it could even be harmful. Signed-off-by: Yunxiang Li Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 2 -- 1 file changed, 2

Re: [PATCH v2 05/10] drm/amd/amdgpu: remove unnecessary flush when enable gart

2024-05-29 Thread Christian König
Signed-off-by: Yunxiang Li With the commit message improved the patch is Reviewed-by: Christian König . Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 --- drivers/gpu/drm/

Re: [PATCH v2 04/10] drm/amdgpu/kfd: remove is_hws_hang and is_resetting

2024-05-29 Thread Christian König
Am 28.05.24 um 19:23 schrieb Yunxiang Li: is_hws_hang and is_resetting serves pretty much the same purpose and they all duplicates the work of the reset_domain lock, just check that directly instead. This also eliminate a few bugs listed below and get rid of dqm->ops.pre_reset. kfd_hws_hang did

Re: [PATCH v2 02/10] drm/amdgpu: fix sriov host flr handler

2024-05-29 Thread Christian König
a nice cleanup to me, but that is absolutely not my field of expertise. But feel free to add an Acked-by: Christian König Regards, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 14 drivers/gpu/drm/amd/amdgpu

Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

2024-05-29 Thread Christian König
Am 28.05.24 um 19:23 schrieb Yunxiang Li: If a reset is triggered, there's no point in waiting for the fence back anymore, it just makes the reset code wait for a long time for the reset_domain read lock to be dropped. This also makes our reply to host FLR fast enough so the host doesn't

Re: [PATCH v2 01/10] drm/amdgpu: add skip_hw_access checks for sriov

2024-05-29 Thread Christian König
is duplicated. On the other hand an extra check doesn't really hurt us. So either way the patch is Reviewed-by: Christian König Regards, Christian. reg_access_ctrl = >gfx.rlc.reg_access_ctrl[xcc_id]; scratch_reg0 = (void __iomem *)adev->rmmio + 4 * reg_access_ctrl->scr

Re: [PATCH] drm/amdgpu: Add lock around VF RLCG interface

2024-05-28 Thread Christian König
Am 27.05.24 um 22:19 schrieb Victor Skvortsov: flush_gpu_tlb may be called from another thread while device_gpu_recover is running. No, that would be illegal. Where do you see that? Regards, Christian. Both of these threads access registers through the VF RLCG interface during VF Full

Re: [PATCH 1/3] drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipe

2024-05-28 Thread Christian König
Reviewed-by: Christian König for the entire series. Regards, Christian. Am 13.05.24 um 22:25 schrieb Alex Deucher: Use correct ref/mask for differnent gfx ring pipe. Ported from ZhenGuo's patch for gfx10. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- 1

Re: [PATCH] drm/amdgpu: drop some kernel messages in VCN code

2024-05-28 Thread Christian König
Acked-by: Christian König Thanks, Christian. Am 23.05.24 um 19:11 schrieb Jiang, Sonny: [AMD Official Use Only - AMD Internal Distribution Only] The patch is Reviewed-by: Sonny Jiang Thanks, Sonny *From:* Dong

Re: [PATCH] drm/amdgpu: Add flags to distinguish vf/pf/pt mode

2024-05-28 Thread Christian König
Am 27.05.24 um 18:28 schrieb Asad Kamal: Add extra flag definition for ids_flag field to distinguish between vf/pf/pt modes v2: Updated kms driver minor version & removed pf check as default is 0 Signed-off-by: Asad Kamal Reviewed-by: Lijo Lazar Acked-by: Christian König --- dri

Re: [PATCH] drm/amdgpu: drop MES 10.1 support v3

2024-05-27 Thread Christian König
Am 23.05.24 um 21:48 schrieb Alex Deucher: It was an enablement vehicle for MES 11 and was never productized. Remove it. v2: drop additional checks in the GFX10 code. v3: drop mes_api_def.h Signed-off-by: Alex Deucher Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/Makefile

Re: [PATCH] drm/radeon/r100: enhance error handling in r100_cp_init_microcode

2024-05-27 Thread Christian König
Am 27.05.24 um 03:20 schrieb Zhouyi Zhou: In r100_cp_init_microcode, if rdev->family don't match any of if statement, fw_name will be NULL, which will cause gcc (11.4.0 powerpc64le-linux-gnu) complain: In function ‘r100_cp_init_microcode’, inlined from ‘r100_cp_init’ at

Re: [RFC PATCH] drm/amdgpu: Refactor sysfs attr functions in AMDGPU for reusability

2024-05-24 Thread Christian König
and maintainability of the code. It also increases the reusability of the attribute management functions, allowing them to be used by multiple modules. Cc: Lijo Lazar Cc: Alex Deucher Cc: Christian König Suggested-by: Alex Deucher Signed-off-by: Srinivasan Shanmugam While at it you could

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-24 Thread Christian König
Am 24.05.24 um 15:35 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] If that is true you could in theory lower the locked area of the existing lock, but adding a new one is strict no-go from my side. I'll try this, right now I see two places where this

Re: [PATCH] drm/amdgpu: silence UBSAN warning

2024-05-24 Thread Christian König
Am 16.05.24 um 15:55 schrieb Alex Deucher: Convert a variable sized array from [1] to []. Signed-off-by: Alex Deucher Reviewed-by: Christian König --- drivers/gpu/drm/amd/include/atomfirmware.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd

Re: [PATCH] drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()

2024-05-24 Thread Christian König
Am 16.05.24 um 17:05 schrieb Alex Deucher: Use current speed/width on devices which don't support dynamic PCIe switching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Signed-off-by: Alex Deucher Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-24 Thread Christian König
Am 23.05.24 um 17:35 schrieb Li, Yunxiang (Teddy): [Public] Here is taking a different lock than the reset_domain->sem. It is a seperate reset_domain->gpu_sem that is only locked when we will actuall do reset, it is not taken in the skip_hw_reset path. Exactly that is what you should *not*

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-23 Thread Christian König
Am 23.05.24 um 13:36 schrieb Li, Yunxiang (Teddy): [AMD Official Use Only - AMD Internal Distribution Only] +void amdgpu_lock_hw_access(struct amdgpu_device *adev); void +amdgpu_unlock_hw_access(struct amdgpu_device *adev); int +amdgpu_begin_hw_access(struct amdgpu_device *adev); void

Re: [PATCH V2] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-05-23 Thread Christian König
Am 23.05.24 um 11:16 schrieb Jesse Zhang: The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang Suggested-by: Christian König Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu

Re: [PATCH V2] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-05-23 Thread Christian König
Am 23.05.24 um 10:07 schrieb Jesse Zhang: The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. V2: When parent is NULL here we should probably call BUG() instead. (Christian) Signed-off-by: Jesse Zhang ---

Re: [PATCH] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-05-23 Thread Christian König
Am 23.05.24 um 08:13 schrieb Jesse Zhang: The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. No that doesn't make any sense. When parent is NULL here we should probably call BUG() instead. Regards, Christian.

Re: [PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

2024-05-23 Thread Christian König
Am 22.05.24 um 19:27 schrieb Yunxiang Li: Random accesses to the GPU while it is not re-initialized can lead to a bad time. So add a rwsem to prevent such accesses. Normal accesses will now take the read lock for shared GPU access, reset takes the write lock for exclusive GPU access. Care need

Re: [PATCH v2] drm/amd/display: Add pixel encoding info to debugfs

2024-05-21 Thread Christian König
Am 21.05.24 um 07:11 schrieb Rino Andre Johnsen: [Why] For debugging and testing purposes. [How] Create amdgpu_current_pixelencoding debugfs entry. Usage: cat /sys/kernel/debug/dri/1/crtc-0/amdgpu_current_pixelencoding Why isn't that available as standard DRM CRTC property in either sysfs or

Re: [PATCH] drm/amdgpu: Fix amdgpu_vm_is_bo_always_valid kerneldoc

2024-05-21 Thread Christian König
Am 20.05.24 um 10:18 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Align kerneldoc with the function argument name. Signed-off-by: Tvrtko Ursulin Reported-by: Stephen Rothwell Fixes: 26e20235ce00 ("drm/amdgpu: Add amdgpu_bo_is_vm_bo helper") Cc: Christian König Cc: Alex Deucher

Re: [PATCH] drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1

2024-05-21 Thread Christian König
Am 17.05.24 um 17:46 schrieb Alex Deucher: On Fri, May 17, 2024 at 2:35 AM Christian König wrote: Am 16.05.24 um 19:57 schrieb Tim Van Patten: From: Tim Van Patten The following commit updated gmc->noretry from 0 to 1 for GC HW IP 9.3.0: commit 5f3854f1f4e2 ("drm/amdgpu:

Re: [PATCH 1/3] drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-05-17 Thread Christian König
Am 16.05.24 um 14:21 schrieb Tvrtko Ursulin: Hi Christian, On 08/05/2024 09:26, Tvrtko Ursulin wrote: On 08/05/2024 06:42, Christian König wrote: Am 06.05.24 um 18:26 schrieb Tvrtko Ursulin: On 03/05/2024 10:14, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Help code readability

Re: [PATCH] drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1

2024-05-17 Thread Christian König
Am 16.05.24 um 19:57 schrieb Tim Van Patten: From: Tim Van Patten The following commit updated gmc->noretry from 0 to 1 for GC HW IP 9.3.0: commit 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") This causes the device to hang when a page fault occurs, until the device is

Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Christian König
Am 15.05.24 um 12:59 schrieb Tvrtko Ursulin: On 15/05/2024 08:20, Christian König wrote: Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Current code appears to live in a misconception that playing with buffer allowed and preferred placements can control the decision

Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

2024-05-15 Thread Christian König
-by: Tvrtko Ursulin Cc: Christian König Cc: Friedrich Vock --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 22708954ae68..d07a1dd7c880

Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting

2024-05-15 Thread Christian König
budget spent. Fix it by looking at the before and after buffer object backing store and only account if there was a change. FIXME: I think this needs a better solution to account for migrations between VRAM visible and non-visible portions. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc

Re: [RFC 0/5] Discussion around eviction improvements

2024-05-14 Thread Christian König
ably rough but should be good enough for dicsussion. I am curious to hear if I identified at least something correctly as a real problem. It would also be good to hear what are the suggested games to check and see whether there is any improvement. Cc: Christian König Cc: Friedrich Vock Tvrt

Re: [PATCH] drm/amdgpu: Use the slab allocator to reduce job allocation fragmentation

2024-05-14 Thread Christian König
Am 14.05.24 um 10:13 schrieb Liang, Prike: [AMD Official Use Only - AMD Internal Distribution Only] From: Koenig, Christian Sent: Friday, May 10, 2024 5:31 PM To: Liang, Prike ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: Use the slab allocator to

Re: [PATCH] drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit

2024-05-14 Thread Christian König
ked-by: Christian König Fixes: 1bece222eab ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid") Cc: Alex Deucher Cc: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/d

Re: [PATCH v2] drm/amdgpu: Add Ring Hang Events

2024-05-13 Thread Christian König
Am 09.05.24 um 22:41 schrieb Ori Messinger: This patch adds 'ring hang' events to the driver. This is done by adding a 'reset_ring_hang' bool variable to the struct 'amdgpu_reset_context' in the amdgpu_reset.h file. The purpose for this 'reset_ring_hang' variable is whenever a GPU reset is

Re: [PATCH] drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend

2024-05-13 Thread Christian König
Am 13.05.24 um 19:41 schrieb David Wu: On 2024-05-13 13:11, Christian König wrote: Am 09.05.24 um 20:40 schrieb David (Ming Qiang) Wu: We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed

Re: [PATCH] drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend

2024-05-13 Thread Christian König
Am 09.05.24 um 20:40 schrieb David (Ming Qiang) Wu: We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed. This effectively gets rid of the warining of "WARN_ON(!amdgpu_irq_enabled(adev,

Re: [PATCH v10 6/6] drm/amdgpu: Enable userq fence interrupt support

2024-05-13 Thread Christian König
Am 10.05.24 um 10:50 schrieb Arunpravin Paneer Selvam: Add support to handle the userqueue protected fence signal hardware interrupt. Create a xarray which maps the doorbell index to the fence driver address. This would help to retrieve the fence driver information when an userq fence interrupt

Re: [PATCH v10 5/6] drm/amdgpu: Remove the MES self test

2024-05-13 Thread Christian König
Am 10.05.24 um 10:50 schrieb Arunpravin Paneer Selvam: Remove MES self test as this conflicts the userqueue fence interrupts. Please also completely remove the amdgpu_mes_self_test() function and any now unused code. Regards, Christian. Signed-off-by: Arunpravin Paneer Selvam ---

Re: [PATCH v3] drm/amdgpu: Add Ring Hang Events

2024-05-13 Thread Christian König
Am 13.05.24 um 06:14 schrieb Ori Messinger: This patch adds 'ring hang' events to the driver. This is done by adding a 'reset_ring_hang' bool variable to the struct 'amdgpu_reset_context' in the amdgpu_reset.h file. The purpose for this 'reset_ring_hang' variable is whenever a GPU reset is

Re: [PATCH v10 4/6] drm/amdgpu: Implement userqueue signal/wait IOCTL

2024-05-13 Thread Christian König
without holding a lock. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 + .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 431 +- .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 6 + drivers/gp

Re: [PATCH 4/5] drm/amdgpu: Fix null pointer dereference to bo

2024-05-13 Thread Christian König
Am 13.05.24 um 10:56 schrieb Ma Jun: Check bo before using it Signed-off-by: Ma Jun Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b

Re: [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata

2024-05-13 Thread Christian König
. The driver core will already nuke the pointer for us when the pci device is removed, so should be safe to simply drop. Alternative would be to move to the driver pci remove callback. Signed-off-by: Matthew Auld Cc: Christian König Cc: Daniel Vetter Cc: amd-gfx@lists.freedesktop.org Oh! Very good

  1   2   3   4   5   6   7   8   9   10   >