Re: [PATCH 12/13] drm/scheduler: rework entity flush, kill and fini

2022-09-30 Thread Andrey Grodzovsky
On 2022-09-30 07:51, Christian König wrote: Am 29.09.22 um 21:20 schrieb Andrey Grodzovsky: On 2022-09-29 09:21, Christian König wrote: This was buggy because when we had to wait for entities which were killed as well we would just deadlock. Instead move all the dependency handling

Re: [PATCH v8] drm/sched: Add FIFO sched policy to run queue

2022-09-30 Thread Andrey Grodzovsky
Thanks for helping with review and good improvement ideas. Pushed to drm-misc-next. Andrey On 2022-09-30 00:12, Luben Tuikov wrote: From: Andrey Grodzovsky When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs

Re: [PATCH 12/13] drm/scheduler: rework entity flush, kill and fini

2022-09-29 Thread Andrey Grodzovsky
On 2022-09-29 09:21, Christian König wrote: This was buggy because when we had to wait for entities which were killed as well we would just deadlock. Instead move all the dependency handling into the callbacks so that will all happen asynchronously. Signed-off-by: Christian König ---

Re: [PATCH 1/2] drm/scheduler: fix fence ref counting

2022-09-29 Thread Andrey Grodzovsky
Series is Reviewed-by: Andrey Grodzovsky Andrey On 2022-09-29 14:01, Christian König wrote: We leaked dependency fences when processes were beeing killed. Additional to that grab a reference to the last scheduled fence. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler

[PATCH v5] drm/sched: Add FIFO sched policy to run queue

2022-09-28 Thread Andrey Grodzovsky
fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop     Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c

Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-27 Thread Andrey Grodzovsky
Hey, i have problems with my git-send today so i just attached V5 as a patch here. Andrey On 2022-09-27 19:56, Luben Tuikov wrote: Inlined: On 2022-09-22 12:15, Andrey Grodzovsky wrote: On 2022-09-22 11:03, Luben Tuikov wrote: The title of this patch has "v3", but "v4"

Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-23 Thread Andrey Grodzovsky
Ping Andrey On 2022-09-22 12:15, Andrey Grodzovsky wrote: On 2022-09-22 11:03, Luben Tuikov wrote: The title of this patch has "v3", but "v4" in the title prefix. If you're using "-v" to git-format-patch, please remove the "v3" from the title.

Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-22 Thread Andrey Grodzovsky
On 2022-09-22 11:03, Luben Tuikov wrote: The title of this patch has "v3", but "v4" in the title prefix. If you're using "-v" to git-format-patch, please remove the "v3" from the title. Inlined: On 2022-09-21 14:28, Andrey Grodzovsky wrote: When ma

[PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-21 Thread Andrey Grodzovsky
in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben)     Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26

Re: [PATCH v3] drm/sched: Add FIFO sched policy to run queue v3

2022-09-20 Thread Andrey Grodzovsky
On 2022-09-19 23:11, Luben Tuikov wrote: Please run this patch through checkpatch.pl, as it shows 12 warnings with it. Use these command line options: "--strict --show-types". Inlined: On 2022-09-13 16:40, Andrey Grodzovsky wrote: Given many entities competing for same run queue o

Re: [PATCH v2] A simple doc fix

2022-09-20 Thread Andrey Grodzovsky
After rebasing to latest drm-misc-next to latest I actually see someone else already fixed this and other kerneldoc warnings so we can skip this patch. Andrey On 2022-09-20 02:46, Anup K Parikh wrote: Fix two warnings during doc build which also results in corresponding additions in

Re: [PATCH v2] A simple doc fix

2022-09-20 Thread Andrey Grodzovsky
Reviewed-by: Andrey Grodzovsky Will push it to drm-misc-next Thanks, Andrey On 2022-09-20 02:46, Anup K Parikh wrote: Fix two warnings during doc build which also results in corresponding additions in generated docs Warnings Fixed: 1. include/drm/gpu_scheduler.h:462: warning: Function

Re: [PATCH] A simple doc fix

2022-09-19 Thread Andrey Grodzovsky
On 2022-09-14 15:26, Anup K Parikh wrote: On Wed, Sep 14, 2022 at 10:24:36AM -0400, Andrey Grodzovsky wrote: On 2022-09-14 06:36, Anup K Parikh wrote: Fix two warnings during doc build which also results in corresponding additions in generated docs Warnings Fixed: 1. include/drm

Re: [PATCH 1/3] drm/scheduler: track GPU active time per entity

2022-09-16 Thread Andrey Grodzovsky
On 2022-09-16 05:12, Lucas Stach wrote: Am Donnerstag, dem 08.09.2022 um 14:33 -0400 schrieb Andrey Grodzovsky: On 2022-09-08 14:10, Lucas Stach wrote: Track the accumulated time that jobs from this entity were active on the GPU. This allows drivers using the scheduler to trivially implement

Re: [PATCH] A simple doc fix

2022-09-14 Thread Andrey Grodzovsky
On 2022-09-14 06:36, Anup K Parikh wrote: Fix two warnings during doc build which also results in corresponding additions in generated docs Warnings Fixed: 1. include/drm/gpu_scheduler.h:462: warning: Function parameter or member 'dev' not described in 'drm_gpu_scheduler' 2.

[PATCH v3] drm/sched: Add FIFO sched policy to run queue v3

2022-09-13 Thread Andrey Grodzovsky
. v3: Various cosmetical fixes and minor refactoring of fifo update function. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm/scheduler/sched_entity.c | 26 - drivers/gpu/drm/scheduler/sched_main.c | 132 ++- include/drm

Re: [PATCH v3 5/6] drm/sched: Use parent fence instead of finished

2022-09-09 Thread Andrey Grodzovsky
Got it. Reviewed-by: Andrey Grodzovsky Andrey On 2022-09-09 16:30, Yadav, Arvind wrote: On 9/9/2022 11:02 PM, Andrey Grodzovsky wrote: What exactly is the scenario which this patch fixes in more detail please  ? GPU reset issue started after adding [PATCH 6/6]. Root cause

Re: [PATCH v3 5/6] drm/sched: Use parent fence instead of finished

2022-09-09 Thread Andrey Grodzovsky
What exactly is the scenario which this patch fixes in more detail please  ? Andrey On 2022-09-09 13:08, Arvind Yadav wrote: Using the parent fence instead of the finished fence to get the job status. This change is to avoid GPU scheduler timeout error which can cause GPU reset.

Re: [PATCH 1/3] drm/scheduler: track GPU active time per entity

2022-09-08 Thread Andrey Grodzovsky
On 2022-09-08 14:10, Lucas Stach wrote: Track the accumulated time that jobs from this entity were active on the GPU. This allows drivers using the scheduler to trivially implement the DRM fdinfo when the hardware doesn't provide more specific information than signalling job completion

Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky
Please send everything together because otherwise it's not clear why we need this. Andrey On 2022-09-08 11:09, James Zhu wrote: Yes, it is for NPI design. I will send out patches for review soon. Thanks! James On 2022-09-08 11:05 a.m., Andrey Grodzovsky wrote: So this is the real need

Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky
ing which is used in this ctx in amdgpu_ctx_fini_entity Best Regards! James On 2022-09-08 10:38 a.m., Andrey Grodzovsky wrote: I guess it's an option but i don't really see what's the added value  ? You saved a few lines in this patch but added a few lines in another. In total seems to

Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky
patch [3/4]: entity->sched_list = num_sched_list > 1 ? sched_list : NULL; I think no special reason to treat single and multiple schedule list here. Best Regards! James On 2022-09-08 10:08 a.m., Andrey Grodzovsky wrote: What's the reason for this entire patch set ? Andrey On 2022-09

Re: [PATCH 1/4] drm/sched: returns struct drm_gpu_scheduler ** for drm_sched_pick_best

2022-09-08 Thread Andrey Grodzovsky
What's the reason for this entire patch set ? Andrey On 2022-09-07 16:57, James Zhu wrote: drm_sched_pick_best returns struct drm_gpu_scheduler ** instead of struct drm_gpu_scheduler * Signed-off-by: James Zhu --- include/drm/gpu_scheduler.h | 2 +- 1 file changed, 1 insertion(+), 1

Re: [PATCH v2 1/4] drm/sched: Enable signaling for finished fence

2022-09-07 Thread Andrey Grodzovsky
On 2022-09-07 02:37, Christian König wrote: Am 06.09.22 um 21:55 schrieb Andrey Grodzovsky: On 2022-09-06 02:34, Christian König wrote: Am 05.09.22 um 18:34 schrieb Arvind Yadav: Here's enabling software signaling for finished fence. Signed-off-by: Arvind Yadav --- Changes in v1 : 1

Re: [PATCH v2] drm/sced: Add FIFO sched policy to rq

2022-09-07 Thread Andrey Grodzovsky
Luben, just a ping, whenever you have time. Andrey On 2022-09-05 01:57, Christian König wrote: Am 03.09.22 um 04:48 schrieb Andrey Grodzovsky: Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being

Re: [PATCH v2] drm/scheduler: quieten kernel-doc warnings

2022-09-06 Thread Andrey Grodzovsky
Pushed to drm-misc-next Andrey On 2022-09-06 13:57, Alex Deucher wrote: On Tue, Sep 6, 2022 at 1:38 PM Andrey Grodzovsky wrote: I RBed, see bellow. Can you push the patch to drm-misc? Alex Andrey On 2022-08-31 14:34, Randy Dunlap wrote: ping? On 4/4/22 14:58, Andrey Grodzovsky wrote

Re: [PATCH v2 1/4] drm/sched: Enable signaling for finished fence

2022-09-06 Thread Andrey Grodzovsky
On 2022-09-06 02:34, Christian König wrote: Am 05.09.22 um 18:34 schrieb Arvind Yadav: Here's enabling software signaling for finished fence. Signed-off-by: Arvind Yadav --- Changes in v1 : 1- Addressing Christian's comment to remove CONFIG_DEBUG_FS check from this patch. 2- The version of

Re: [PATCH v2] drm/scheduler: quieten kernel-doc warnings

2022-09-06 Thread Andrey Grodzovsky
I RBed, see bellow. Andrey On 2022-08-31 14:34, Randy Dunlap wrote: ping? On 4/4/22 14:58, Andrey Grodzovsky wrote: Reviewed-by: Andrey Grodzovsky Andrey On 2022-04-04 17:30, Randy Dunlap wrote: Fix kernel-doc warnings in gpu_scheduler.h and sched_main.c. Quashes these warnings: include

[PATCH v2] drm/sced: Add FIFO sched policy to rq

2022-09-02 Thread Andrey Grodzovsky
for entites based on TS of oldest job waiting in job queue of enitity. Improves next enitity extraction to O(1). Enitity TS update O(log(number of entites in rq)) Drop default option in module control parameter. Signed-off-by: Andrey Grodzovsky Tested-by: Li Yunxiang (Teddy) --- drivers/gpu/drm

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-25 Thread Andrey Grodzovsky
On 2022-08-24 22:29, Luben Tuikov wrote: Inlined: On 2022-08-24 12:21, Andrey Grodzovsky wrote: On 2022-08-23 17:37, Luben Tuikov wrote: On 2022-08-23 14:57, Andrey Grodzovsky wrote: On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-25 Thread Andrey Grodzovsky
On 2022-08-24 22:29, Luben Tuikov wrote: Inlined: On 2022-08-24 12:21, Andrey Grodzovsky wrote: On 2022-08-23 17:37, Luben Tuikov wrote: On 2022-08-23 14:57, Andrey Grodzovsky wrote: On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-25 Thread Andrey Grodzovsky
On 2022-08-23 17:37, Luben Tuikov wrote: On 2022-08-23 14:57, Andrey Grodzovsky wrote: On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-24 Thread Andrey Grodzovsky
On 2022-08-24 04:29, Michel Dänzer wrote: On 2022-08-22 22:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-23 Thread Andrey Grodzovsky
On 2022-08-23 14:30, Luben Tuikov wrote: On 2022-08-23 14:13, Andrey Grodzovsky wrote: On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on ^Problem same scheduler an uncceptabliy long wait

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-23 Thread Andrey Grodzovsky
On 2022-08-23 12:58, Luben Tuikov wrote: Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: Poblem: Given many entities competing for same rq on ^Problem same scheduler an uncceptabliy long wait time for some ^unacceptably jobs waiting stuck in rq before being picked up

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-23 Thread Andrey Grodzovsky
On 2022-08-23 08:15, Christian König wrote: Am 22.08.22 um 22:09 schrieb Andrey Grodzovsky: Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using  GPUVis). The issue

[PATCH] drm/sced: Add FIFO policy for scheduler rq

2022-08-22 Thread Andrey Grodzovsky
in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky

Re: [PATCH] drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()

2022-07-25 Thread Andrey Grodzovsky
Reviewed-by: Andrey Grodzovsky Andrey On 2022-07-19 06:39, Andrey Strachuk wrote: Local variable 'rq' is initialized by an address of field of drm_sched_job, so it does not make sense to compare 'rq' with NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-07-16 Thread Andrey Grodzovsky
On 2022-07-14 17:16, Alex Deucher wrote: On Thu, Jul 14, 2022 at 1:58 PM Andrey Grodzovsky wrote: On 2022-07-14 12:22, Alex Deucher wrote: On Thu, Jul 14, 2022 at 10:14 AM Andrey Grodzovsky wrote: On 2022-07-14 05:57, Dmitry Osipenko wrote: On 7/12/22 11:56, Dmitry Osipenko wrote: On 7

Re: [PATCH 01/10] drm/sched: move calling drm_sched_entity_select_rq

2022-07-14 Thread Andrey Grodzovsky
Found the new use case from the 5/10 of reordering CS ioctl. Reviewed-by: Andrey Grodzovsky Andrey On 2022-07-14 12:26, Christian König wrote: We need this for limiting codecs like AV1 to the first instance for VCN3. Essentially the idea is that we first initialize the job with entity, id

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-07-14 Thread Andrey Grodzovsky
On 2022-07-14 12:22, Alex Deucher wrote: On Thu, Jul 14, 2022 at 10:14 AM Andrey Grodzovsky wrote: On 2022-07-14 05:57, Dmitry Osipenko wrote: On 7/12/22 11:56, Dmitry Osipenko wrote: On 7/6/22 18:46, Alex Deucher wrote: On Wed, Jul 6, 2022 at 9:49 AM Andrey Grodzovsky wrote: On 2022-07

Re: [PATCH 01/10] drm/sched: move calling drm_sched_entity_select_rq

2022-07-14 Thread Andrey Grodzovsky
-by: Christian König CC: Andrey Grodzovsky CC: dri-devel@lists.freedesktop.org --- drivers/gpu/drm/scheduler/sched_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-07-14 Thread Andrey Grodzovsky
On 2022-07-14 05:57, Dmitry Osipenko wrote: On 7/12/22 11:56, Dmitry Osipenko wrote: On 7/6/22 18:46, Alex Deucher wrote: On Wed, Jul 6, 2022 at 9:49 AM Andrey Grodzovsky wrote: On 2022-07-06 03:07, Dmitry Osipenko wrote: Hello Andrey, On 5/17/22 17:48, Dmitry Osipenko wrote: On 5/17

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-07-06 Thread Andrey Grodzovsky
On 2022-07-06 03:07, Dmitry Osipenko wrote: Hello Andrey, On 5/17/22 17:48, Dmitry Osipenko wrote: On 5/17/22 17:13, Andrey Grodzovsky wrote: Done. Andrey Awesome, thank you! Given that this drm-scheduler issue needs to be fixed in the 5.19-RC and earlier, shouldn't it be in the drm

[PATCH v2 4/4] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-24 Thread Andrey Grodzovsky
we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/d

[PATCH v2 2/4] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-24 Thread Andrey Grodzovsky
interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. v2: Switch from irq_get/put to full enable/disable_irq for amdgpu Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

[PATCH v2 3/4] drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'

2022-06-24 Thread Andrey Grodzovsky
ext patch). [1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3 Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/scheduler/sched_main.c | 13 ++--- 1 file changed, 10 insertions(+), 3

[PATCH v2 1/4] drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences

2022-06-24 Thread Andrey Grodzovsky
This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_process. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++- 1 file changed, 3 insertions(+), 1

[PATCH v2 0/4] Rework amdgpu HW fence refocunt and update scheduler parent fence refcount.

2022-06-24 Thread Andrey Grodzovsky
file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view?usp=sharing Andrey Grodzovsky (4): drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences drm/amdgpu: Prevent race between late signaled fences and GPU reset. drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer' drm/amdgpu: Follow up change to

Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-23 Thread Andrey Grodzovsky
On 2022-06-22 11:04, Christian König wrote: Am 22.06.22 um 17:01 schrieb Andrey Grodzovsky: On 2022-06-22 05:00, Christian König wrote: Am 21.06.22 um 21:34 schrieb Andrey Grodzovsky: On 2022-06-21 03:19, Christian König wrote: Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem

Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-23 Thread Andrey Grodzovsky
On 2022-06-23 01:52, Christian König wrote: Am 22.06.22 um 19:19 schrieb Andrey Grodzovsky: On 2022-06-22 03:17, Christian König wrote: Am 21.06.22 um 22:00 schrieb Andrey Grodzovsky: On 2022-06-21 03:28, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align

Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-22 Thread Andrey Grodzovsky
Just a ping Andrey On 2022-06-21 15:45, Andrey Grodzovsky wrote: On 2022-06-21 03:25, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire

Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-22 Thread Andrey Grodzovsky
On 2022-06-22 03:17, Christian König wrote: Am 21.06.22 um 22:00 schrieb Andrey Grodzovsky: On 2022-06-21 03:28, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences

Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-22 Thread Andrey Grodzovsky
On 2022-06-22 05:00, Christian König wrote: Am 21.06.22 um 21:34 schrieb Andrey Grodzovsky: On 2022-06-21 03:19, Christian König wrote: Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1

Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread Andrey Grodzovsky
, VURDIGERENATARAJ, CHANDAN wrote: Hi, Is this a preventive fix or you found errors/oops/hangs? If you had found errors/oops/hangs, can you please share the details? BR, Chandan V N On 2022-06-21 03:25, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we

Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-21 Thread Andrey Grodzovsky
On 2022-06-21 03:28, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using

Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread Andrey Grodzovsky
On 2022-06-21 03:25, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from

Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-21 Thread Andrey Grodzovsky
On 2022-06-21 03:19, Christian König wrote: Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_

[PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-20 Thread Andrey Grodzovsky
we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++ drivers/gpu/d

[PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-20 Thread Andrey Grodzovsky
interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 26

[PATCH 4/5] drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'

2022-06-20 Thread Andrey Grodzovsky
ext patch). [1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4...@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3 Signed-off-by: Andrey Grodzovsky Tested-by: Yiqing Yao --- drivers/gpu/drm/scheduler/sched_main.c | 16 +--- 1 file changed, 13 insertions(+), 3

[PATCH 2/5] drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences

2022-06-20 Thread Andrey Grodzovsky
This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_process. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers

[PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-20 Thread Andrey Grodzovsky
Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_fence_put -> refcount 1 Fix: Add put for external_hw_fence in amdgpu_job_free/free_cb Signed-of

[PATCH 0/5] Rework amdgpu HW fence refocunt and update scheduler parent fence refcount.

2022-06-20 Thread Andrey Grodzovsky
/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view?usp=sharing Andrey Grodzovsky (5): drm/amdgpu: Fix possible refcount leak for release of external_hw_fence drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences drm/amdgpu: Prevent race between late signaled fences and GPU reset. drm/sched: Partial revert

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-05-17 Thread Andrey Grodzovsky
Done. Andrey On 2022-05-17 10:03, Andrey Grodzovsky wrote: Let me push it into drm-misc-next. Andrey On 2022-05-17 05:03, Dmitry Osipenko wrote: On 5/17/22 10:40, Erico Nunes wrote: On Wed, Apr 13, 2022 at 12:05 PM Steven Price wrote: On 11/04/2022 23:15, Dmitry Osipenko wrote

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-05-17 Thread Andrey Grodzovsky
Let me push it into drm-misc-next. Andrey On 2022-05-17 05:03, Dmitry Osipenko wrote: On 5/17/22 10:40, Erico Nunes wrote: On Wed, Apr 13, 2022 at 12:05 PM Steven Price wrote: On 11/04/2022 23:15, Dmitry Osipenko wrote: Interrupt context can't sleep. Drivers like Panfrost and MSM are

Re: [Bug 215958] New: thunderbolt3 egpu cannot disconnect cleanly

2022-05-10 Thread Andrey Grodzovsky
On 2022-05-09 14:03, Deucher, Alexander wrote: [Public] -Original Message- From: Bjorn Helgaas Sent: Monday, May 9, 2022 12:23 PM To: Linux PCI Cc: r087...@yahoo.it; Deucher, Alexander ; Koenig, Christian ; Pan, Xinhui ; amd-gfx mailing list ; dri-devel Subject: Re: [Bug 215958]

Re: [PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails

2022-04-29 Thread Andrey Grodzovsky
ing this if possible. Patch is Reviewed-by: Andrey Grodzovsky Andrey On 2022-04-28 23:03, Hangyu Hua wrote: If fence is released in drm_sched_job_add_implicit_dependencies(), a dangling pointer will be in obj->resv. specific scenario: recount = 1 init, obj->resv->fence_excl = fence recoun

Re: [PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails

2022-04-28 Thread Andrey Grodzovsky
On 2022-04-28 04:56, Hangyu Hua wrote: On 2022/4/27 22:43, Andrey Grodzovsky wrote: On 2022-04-26 22:31, Hangyu Hua wrote: On 2022/4/26 22:55, Andrey Grodzovsky wrote: On 2022-04-25 22:54, Hangyu Hua wrote: On 2022/4/25 23:42, Andrey Grodzovsky wrote: On 2022-04-25 04:36, Hangyu Hua

Re: [PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails

2022-04-27 Thread Andrey Grodzovsky
On 2022-04-26 22:31, Hangyu Hua wrote: On 2022/4/26 22:55, Andrey Grodzovsky wrote: On 2022-04-25 22:54, Hangyu Hua wrote: On 2022/4/25 23:42, Andrey Grodzovsky wrote: On 2022-04-25 04:36, Hangyu Hua wrote: When drm_sched_job_add_dependency() fails, dma_fence_put() will be called

Re: [PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails

2022-04-26 Thread Andrey Grodzovsky
On 2022-04-25 22:54, Hangyu Hua wrote: On 2022/4/25 23:42, Andrey Grodzovsky wrote: On 2022-04-25 04:36, Hangyu Hua wrote: When drm_sched_job_add_dependency() fails, dma_fence_put() will be called internally. Calling it again after drm_sched_job_add_dependency() finishes may result

Re: [PATCH v2 1/2] drm/sched: use DECLARE_EVENT_CLASS

2022-04-26 Thread Andrey Grodzovsky
Done Andrey On 2022-04-26 14:52, Chia-I Wu wrote: That would be great. I don't have push permission. On Tue, Apr 26, 2022 at 11:25 AM Andrey Grodzovsky wrote: It's ok to land but it wasn't, do you have push permissions to drm-misc-next ? If not, I will do it for you. Andrey On 2022-04-26

Re: [PATCH v2 1/2] drm/sched: use DECLARE_EVENT_CLASS

2022-04-26 Thread Andrey Grodzovsky
drm_sched_job_entity to drm_sched_job (Andrey) Signed-off-by: Chia-I Wu Cc: Rob Clark Reviewed-by: Andrey Grodzovsky This series has been reviewed. Is it ok to land (if it hasn't)?

Re: [PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails

2022-04-26 Thread Andrey Grodzovsky
On 2022-04-25 22:54, Hangyu Hua wrote: On 2022/4/25 23:42, Andrey Grodzovsky wrote: On 2022-04-25 04:36, Hangyu Hua wrote: When drm_sched_job_add_dependency() fails, dma_fence_put() will be called internally. Calling it again after drm_sched_job_add_dependency() finishes may result

Re: [PATCH] gpu: drm: remove redundant dma_fence_put() when drm_sched_job_add_dependency() fails

2022-04-25 Thread Andrey Grodzovsky
On 2022-04-25 04:36, Hangyu Hua wrote: When drm_sched_job_add_dependency() fails, dma_fence_put() will be called internally. Calling it again after drm_sched_job_add_dependency() finishes may result in a dangling pointer. Fix this by removing redundant dma_fence_put(). Signed-off-by: Hangyu

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-04-12 Thread Andrey Grodzovsky
On 2022-04-12 14:20, Dmitry Osipenko wrote: On 4/12/22 19:51, Andrey Grodzovsky wrote: On 2022-04-11 18:15, Dmitry Osipenko wrote: Interrupt context can't sleep. Drivers like Panfrost and MSM are taking mutex when job is released, and thus, that code can sleep. This results into &quo

Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

2022-04-12 Thread Andrey Grodzovsky
On 2022-04-11 18:15, Dmitry Osipenko wrote: Interrupt context can't sleep. Drivers like Panfrost and MSM are taking mutex when job is released, and thus, that code can sleep. This results into "BUG: scheduling while atomic" if locks are contented while job is freed. There is no good reason for

Re: [PATCH 1/2] drm/sched: use DECLARE_EVENT_CLASS

2022-04-11 Thread Andrey Grodzovsky
TRACE_INCLUDE_FILE gpu_scheduler_trace -TRACE_EVENT(drm_sched_job, +DECLARE_EVENT_CLASS(drm_sched_job_entity, I would just call it drm_sched_job since that what it is. With that the series is Reviewed-by: Andrey Grodzovsky Andrey TP_PROTO(struct drm_sched_job *sched_job, struct

Re: [PATCH v2] drm/scheduler: quieten kernel-doc warnings

2022-04-04 Thread Andrey Grodzovsky
Reviewed-by: Andrey Grodzovsky Andrey On 2022-04-04 17:30, Randy Dunlap wrote: Fix kernel-doc warnings in gpu_scheduler.h and sched_main.c. Quashes these warnings: include/drm/gpu_scheduler.h:332: warning: missing initial short description on line: * struct drm_sched_backend_ops include

Re: [PATCH] drm/scheduler: quieten kernel-doc warnings

2022-04-04 Thread Andrey Grodzovsky
Seems to me better this way to avoid merge conflicts ? Andrey On 2022-04-04 11:33, Randy Dunlap wrote: On 4/4/22 07:34, Andrey Grodzovsky wrote: On 2022-04-04 00:25, Randy Dunlap wrote: Fix kernel-doc warnings in gpu_scheduler.h and sched_main.c. Quashes these warnings: include/drm

Re: [PATCH] drm/scheduler: quieten kernel-doc warnings

2022-04-04 Thread Andrey Grodzovsky
a processes") Signed-off-by: Randy Dunlap Cc: David Airlie Cc: Daniel Vetter Cc: Andrey Grodzovsky Cc: Nayan Deshmukh Cc: Alex Deucher Cc: Christian König Cc: Jiawei Gu Cc: dri-devel@lists.freedesktop.org --- Feel free to make changes or suggest changes... drivers/gpu/drm/scheduler/sched_m

Re: linux-next: build warning after merge of the drm-misc tree

2022-03-28 Thread Andrey Grodzovsky
On 2022-03-27 19:56, Stephen Rothwell wrote: Hi Andrey, On Tue, 1 Mar 2022 22:26:12 -0500 Andrey Grodzovsky wrote: On 2022-03-01 20:31, Stephen Rothwell wrote: Hi all, On Thu, 20 Jan 2022 14:26:39 +1100 Stephen Rothwell wrote: On Wed, 17 Nov 2021 13:49:26 +1100 Stephen Rothwell

[PATCH] drm/sched: Fix htmldoc warning.

2022-03-28 Thread Andrey Grodzovsky
Fixes the warning. Signed-off-by: Andrey Grodzovsky --- include/drm/gpu_scheduler.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 944f83ef9f2e..0fca8f38bee4 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm

Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

2022-03-18 Thread Andrey Grodzovsky
On 2022-03-18 13:22, Rob Clark wrote: On Fri, Mar 18, 2022 at 9:27 AM Andrey Grodzovsky wrote: On 2022-03-18 12:20, Rob Clark wrote: On Fri, Mar 18, 2022 at 9:04 AM Andrey Grodzovsky wrote: On 2022-03-17 16:35, Rob Clark wrote: On Thu, Mar 17, 2022 at 12:50 PM Andrey Grodzovsky wrote

Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

2022-03-18 Thread Andrey Grodzovsky
On 2022-03-18 12:20, Rob Clark wrote: On Fri, Mar 18, 2022 at 9:04 AM Andrey Grodzovsky wrote: On 2022-03-17 16:35, Rob Clark wrote: On Thu, Mar 17, 2022 at 12:50 PM Andrey Grodzovsky wrote: On 2022-03-17 14:25, Rob Clark wrote: On Thu, Mar 17, 2022 at 11:10 AM Andrey Grodzovsky wrote

Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

2022-03-18 Thread Andrey Grodzovsky
On 2022-03-17 16:35, Rob Clark wrote: On Thu, Mar 17, 2022 at 12:50 PM Andrey Grodzovsky wrote: On 2022-03-17 14:25, Rob Clark wrote: On Thu, Mar 17, 2022 at 11:10 AM Andrey Grodzovsky wrote: On 2022-03-17 13:35, Rob Clark wrote: On Thu, Mar 17, 2022 at 9:45 AM Christian König wrote

Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

2022-03-17 Thread Andrey Grodzovsky
On 2022-03-17 14:25, Rob Clark wrote: On Thu, Mar 17, 2022 at 11:10 AM Andrey Grodzovsky wrote: On 2022-03-17 13:35, Rob Clark wrote: On Thu, Mar 17, 2022 at 9:45 AM Christian König wrote: Am 17.03.22 um 17:18 schrieb Rob Clark: On Thu, Mar 17, 2022 at 9:04 AM Christian König wrote

Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

2022-03-17 Thread Andrey Grodzovsky
On 2022-03-17 13:35, Rob Clark wrote: On Thu, Mar 17, 2022 at 9:45 AM Christian König wrote: Am 17.03.22 um 17:18 schrieb Rob Clark: On Thu, Mar 17, 2022 at 9:04 AM Christian König wrote: Am 17.03.22 um 16:10 schrieb Rob Clark: [SNIP] userspace frozen != kthread frozen .. that is what

Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

2022-03-17 Thread Andrey Grodzovsky
On 2022-03-17 12:04, Christian König wrote: Am 17.03.22 um 16:10 schrieb Rob Clark: [SNIP] userspace frozen != kthread frozen .. that is what this patch is trying to address, so we aren't racing between shutting down the hw and the scheduler shoveling more jobs at us. Well exactly that's

Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event

2022-03-10 Thread Andrey Grodzovsky
On 2022-03-10 11:21, Sharma, Shashank wrote: On 3/10/2022 4:24 PM, Rob Clark wrote: On Thu, Mar 10, 2022 at 1:55 AM Christian König wrote: Am 09.03.22 um 19:12 schrieb Rob Clark: On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma wrote: From: Shashank Sharma This patch adds a new

Re: linux-next: build warning after merge of the drm-misc tree

2022-03-01 Thread Andrey Grodzovsky
Please check you have commit c7703ce38c1e Andrey Grodzovsky   3 weeks ago    drm/amdgpu: Fix htmldoc warning Andrey On 2022-03-01 20:31, Stephen Rothwell wrote: Hi all, On Thu, 20 Jan 2022 14:26:39 +1100 Stephen Rothwell wrote: On Wed, 17 Nov 2021 13:49:26 +1100 Stephen Rothwell wrote

Re: [PATCH] drm/v3d: centralize error handling when init scheduler fails

2022-02-28 Thread Andrey Grodzovsky
Acked-by: Andrey Grodzovsky Andrey On 2022-02-28 13:16, Melissa Wen wrote: Remove redundant error message (since now it is very similar to what we do in drm_sched_init) and centralize all error handling in a unique place, as we follow the same steps in any case of failure. Signed-off

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-24 Thread Andrey Grodzovsky
] If it applies cleanly, feel free to drop it in.  I'll drop those patches for drm-next since they are already in drm-misc. Alex *From:* amd-gfx on behalf of Andrey Grodzovsky *Sent:* Thursday, February 24, 2022 11:24 AM

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-24 Thread Andrey Grodzovsky
Grodzovsky wrote: All comments are fixed and code pushed. Thanks for everyone who helped reviewing. Andrey On 2022-02-09 02:53, Christian König wrote: Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: Before we initialize schedulers we must know which reset domain are we in - for single device

Re: [PATCH] drm/sched: Add device pointer to drm_gpu_scheduler

2022-02-20 Thread Andrey Grodzovsky
On 2022-02-20 22:32, Gu, JiaWei (Will) wrote: [AMD Official Use Only] Pinging. -Original Message- From: Jiawei Gu Sent: Thursday, February 17, 2022 6:44 PM To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng,

[PATCH] drm/amdgpu: Fix htmldoc warning

2022-02-11 Thread Andrey Grodzovsky
Update function name. Signed-off-by: Andrey Grodzovsky Reported-by: kernel test robot --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Re: [PATCH] drm/amdgpu: Fix compile error.

2022-02-10 Thread Andrey Grodzovsky
On 2022-02-10 02:06, Christian König wrote: Am 10.02.22 um 04:17 schrieb Andrey Grodzovsky: Seems I forgot to add this to the relevant commit when submitting. Rebase/merge issue? Looks like it. It looks more like I forgot to add the header file change to the commit after updating

[PATCH] drm/amdgpu: Fix compile error.

2022-02-09 Thread Andrey Grodzovsky
Seems I forgot to add this to the relevant commit when submitting. Signed-off-by: Andrey Grodzovsky Reported-by: kernel test robot --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b

Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-09 Thread Andrey Grodzovsky
All comments are fixed and code pushed. Thanks for everyone who helped reviewing. Andrey On 2022-02-09 02:53, Christian König wrote: Am 09.02.22 um 01:23 schrieb Andrey Grodzovsky: Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-09 Thread Andrey Grodzovsky
Thanks a lot! Andrey On 2022-02-09 01:06, JingWen Chen wrote: Hi Andrey, I have been testing your patch and it seems fine till now. Best Regards, Jingwen Chen On 2022/2/3 上午2:57, Andrey Grodzovsky wrote: Just another ping, with Shyun's help I was able to do some smoke testing on XGMI

  1   2   3   4   5   6   7   8   9   >