Re: [PATCH 1/2] drm/amdgpu: drop volatile from ring buffer

2024-10-10 Thread Tvrtko Ursulin
On 09/10/2024 13:07, Christian König wrote: Am 09.10.24 um 09:41 schrieb Tvrtko Ursulin: On 08/10/2024 19:11, Christian König wrote: Volatile only prevents the compiler from re-ordering reads and writes. Since we always only modify the ring buffer from one CPU thread and have an explicit

Re: [PATCH 0/4] Ring padding CPU optimisation and some RFC bits

2024-10-09 Thread Tvrtko Ursulin
On 08/10/2024 19:10, Christian König wrote: Am 08.10.24 um 17:05 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin I've noticed the hardware ring padding optimisations have landed so I decided to respin the CPU side optimisations. First two patches are simply adding ring fill helpers

Re: [PATCH 2/2] drm/amdgpu: stop masking the wptr all the time

2024-10-09 Thread Tvrtko Ursulin
On 08/10/2024 19:11, Christian König wrote: Stop masking the wptr and decrementing the count_dw while writing into the ring buffer. We can do that all at once while pushing the changes to the HW. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 11 +--

Re: [PATCH 1/2] drm/amdgpu: drop volatile from ring buffer

2024-10-09 Thread Tvrtko Ursulin
On 08/10/2024 19:11, Christian König wrote: Volatile only prevents the compiler from re-ordering reads and writes. Since we always only modify the ring buffer from one CPU thread and have an explicit barrier before signaling the HW this should have no effect at all and just prevents compiler op

[RFC 3/4] drm/amdgpu: Add and use amdgpu_ring_write_addr() helper

2024-10-08 Thread Tvrtko Ursulin
From: Tvrtko Ursulin I've noticed there is really a lot of places which write addresses into the ring as two writes of lower_32_bits() followed by upper_32_bits(). Is it worth adding a helper to do those in one go? It shrinks the source and binary a bit but is the readability better, or

[PATCH 2/4] drm/amdgpu: More more efficient ring padding

2024-10-08 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Similarly as in the previous patch, we add a new amdgpu_ring_fill2x32() helper which can write out the nops more efficiently using memset64(). This should have a lesser effect than the previous patch, given how the affected rings have at most 64 dword alignment restriction

[RFC 4/4] drm/amdgpu: Document the magic big endian bit

2024-10-08 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Similar to the previous patch but with the addition of a magic bit1 set on big endian platforms. No idea what it is but maybe adding a helper and giving both it and the magic bit a proper name would be worth it. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Sunil

[PATCH 1/4] drm/amdgpu: More efficient ring padding

2024-10-08 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Having noticed that typically 200+ nops per submission are written into the ring, using a rather verbose one-nop-at-a-time-plus-ring-buffer- arithmetic as done in amdgpu_ring_write(), the obvious idea was to improve it by filling those nops in blocks. This patch therefore

[PATCH 0/4] Ring padding CPU optimisation and some RFC bits

2024-10-08 Thread Tvrtko Ursulin
From: Tvrtko Ursulin I've noticed the hardware ring padding optimisations have landed so I decided to respin the CPU side optimisations. First two patches are simply adding ring fill helpers which deal with reducing the CPU cost of emitting hundreds of nops from the for-amdgpu_ring_write

Re: [PATCH 3/3] drm/amdgpu: Remove the while loop from amdgpu_job_prepare_job

2024-10-08 Thread Tvrtko Ursulin
On 07/10/2024 15:39, Alex Deucher wrote: On Mon, Oct 7, 2024 at 8:52 AM Tvrtko Ursulin wrote: On 04/10/2024 15:15, Alex Deucher wrote: Applied. Thanks! Thanks Alex! Could you perhaps also merge https://lore.kernel.org/amd-gfx/20240813135712.82611-1-tursu...@igalia.com/ via your tree

Re: [PATCH 3/3] drm/amdgpu: Remove the while loop from amdgpu_job_prepare_job

2024-10-07 Thread Tvrtko Ursulin
On 04/10/2024 15:15, Alex Deucher wrote: Applied. Thanks! Thanks Alex! Could you perhaps also merge https://lore.kernel.org/amd-gfx/20240813135712.82611-1-tursu...@igalia.com/ via your tree? If it still applies that is. Regards, Tvrtko On Fri, Oct 4, 2024 at 3:28 AM Tvrtko Ursulin

Re: [PATCH 3/3] drm/amdgpu: Remove the while loop from amdgpu_job_prepare_job

2024-10-04 Thread Tvrtko Ursulin
On 24/09/2024 13:06, Christian König wrote: Am 24.09.24 um 11:51 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin While loop makes it sound like amdgpu_vmid_grab() potentially needs to be called multiple times to produce a fence, while in reality all code paths either return an error, assign a

Re: [PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-30 Thread Tvrtko Ursulin
On 30/09/2024 14:07, Christian König wrote: Am 30.09.24 um 15:01 schrieb Tvrtko Ursulin: On 13/09/2024 17:05, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko

Re: [PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-30 Thread Tvrtko Ursulin
On 13/09/2024 17:05, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple q

Re: [PATCH v4 6/6] drm/amdgpu: use drm_file::name in task_info::process_desc

2024-09-30 Thread Tvrtko Ursulin
On 27/09/2024 09:48, Pierre-Eric Pelloux-Prayer wrote: If a drm_file name is set append it to the process name. This information is useful with the virtio/native-context driver: this allows the guest applications identifier to visible in amdgpu's output. The output in amdgpu_vm_info/amdgpu_ge

Re: [PATCH v4 1/6] drm: add DRM_SET_CLIENT_NAME ioctl

2024-09-30 Thread Tvrtko Ursulin
sg, fdinfo, etc), -EINVAL is returned. A 0-length string is a valid use, and clears the existing name. Reviewed-by: Tvrtko Ursulin Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/drm_debugfs.c | 14 ++--- drivers/gpu/drm/drm_file.c| 5 drivers/gpu/drm/drm

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-27 Thread Tvrtko Ursulin
On 26/09/2024 09:15, Philipp Stanner wrote: On Mon, 2024-09-23 at 15:35 +0100, Tvrtko Ursulin wrote: Ping Christian and Philipp - reasonably happy with v2? I think it's the only unreviewed patch from the series. Howdy, sry for the delay, I had been traveling. I have a few nits

Re: [PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-25 Thread Tvrtko Ursulin
they also go to drm-misc-fixes? I am not too familiar with the drm-misc flow. Or the series now needs to wait for some backmerge? Regards, Tvrtko Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets ma

Re: [PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-25 Thread Tvrtko Ursulin
On 24/09/2024 15:20, Christian König wrote: Am 24.09.24 um 16:12 schrieb Tvrtko Ursulin: On 24/09/2024 14:55, Christian König wrote: I've pushed the first to drm-misc-next, but that one here fails to apply cleanly. This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flo

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin
On 24/09/2024 10:45, Tvrtko Ursulin wrote: On 24/09/2024 09:20, Christian König wrote: Am 16.09.24 um 19:30 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", wit

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin
On 24/09/2024 09:20, Christian König wrote: Am 16.09.24 um 19:30 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we ca

Re: [PATCH v3 3/6] drm/amdgpu: delay the use of amdgpu_vm_set_task_info

2024-09-24 Thread Tvrtko Ursulin
On 24/09/2024 09:23, Christian König wrote: Am 23.09.24 um 12:25 schrieb Tvrtko Ursulin: On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: At this point the vm is locked so we safely modify it without risk of concurrent access. To which particular lock this is referring to and does

Re: [PATCH v3 1/6] drm: add DRM_SET_NAME ioctl

2024-09-24 Thread Tvrtko Ursulin
On 24/09/2024 09:22, Pierre-Eric Pelloux-Prayer wrote: Le 23/09/2024 à 12:06, Tvrtko Ursulin a écrit : On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: Giving the opportunity to userspace to associate a free-form name with a drm_file struct is helpful for tracking and debugging

Re: [PATCH v3 1/6] drm: add DRM_SET_NAME ioctl

2024-09-24 Thread Tvrtko Ursulin
#define DRM_IOCTL_MODE_CLOSEFBDRM_IOWR(0xD0, struct drm_mode_closefb) +/** + * DRM_IOCTL_SET_NAME - Attach a name to a drm_file + * + * This ioctl is similar to DMA_BUF_SET_NAME - it allows for easier tracking + * and debugging. + * The length of the name must <= DRM_NAME_MA

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin
Ping Christian and Philipp - reasonably happy with v2? I think it's the only unreviewed patch from the series. Regards, Tvrtko On 16/09/2024 18:30, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sc

Re: [PATCH v3 4/6] drm/amdgpu: alloc and init vm::task_info from first submit

2024-09-24 Thread Tvrtko Ursulin
On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: This will allow to use flexible array to store the process name and other information. This also means that process name will be determined once and for all, instead of at each submit. But the pid and others can still change? By design?

Re: [PATCH v3 3/6] drm/amdgpu: delay the use of amdgpu_vm_set_task_info

2024-09-24 Thread Tvrtko Ursulin
On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: At this point the vm is locked so we safely modify it without risk of concurrent access. To which particular lock this is referring to and does this imply previous placement was unsafe? Regards, Tvrtko Signed-off-by: Pierre-Eric Pel

[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of contin

[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. While at it, lets also re-order the members so all

[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_

[PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more

[PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian

[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * I

[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov

[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc

[PATCH v3 0/8] DRM scheduler fixes and improvements

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin All reviewed now, re-sending after rebasing on latest drm-tip so it is in a mergeable state. Tvrtko Ursulin (8): drm/sched: Add locking to drm_sched_entity_modify_sched drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job drm/sched: Always

[PATCH 3/3] drm/amdgpu: Remove the while loop from amdgpu_job_prepare_job

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin While loop makes it sound like amdgpu_vmid_grab() potentially needs to be called multiple times to produce a fence, while in reality all code paths either return an error, assign a valid job->vmid or assign a vmid which will be valid once the returned fence sign

[PATCH 2/3] drm/amdgpu: Drop impossible condition from amdgpu_job_prepare_job

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Fence has been initialised to NULL so no need to test it. Signed-off-by: Tvrtko Ursulin Cc: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers

[PATCH 1/3] drm/amdgpu: Drop unused fence argument from amdgpu_vmid_grab_used

2024-09-24 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Fence argument is unused so lets drop it. Signed-off-by: Tvrtko Ursulin Cc: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-17 Thread Tvrtko Ursulin
On 16/09/2024 13:20, Tvrtko Ursulin wrote: On 16/09/2024 13:11, Christian König wrote: Am 13.09.24 um 18:05 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", wit

[PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-16 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-16 Thread Tvrtko Ursulin
On 16/09/2024 09:16, Philipp Stanner wrote: On Fri, 2024-09-13 at 17:05 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-16 Thread Tvrtko Ursulin
On 16/09/2024 13:11, Christian König wrote: Am 13.09.24 um 18:05 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we ca

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-14 Thread Tvrtko Ursulin
On 10/09/2024 16:03, Christian König wrote: Am 10.09.24 um 11:46 schrieb Tvrtko Ursulin: On 10/09/2024 10:08, Christian König wrote: Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "d

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-14 Thread Tvrtko Ursulin
On 13/09/2024 13:19, Philipp Stanner wrote: On Wed, 2024-09-11 at 13:22 +0100, Tvrtko Ursulin wrote: On 10/09/2024 11:25, Philipp Stanner wrote: On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a pa

[PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian

[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * I

[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov

[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. While at it, lets also re-order the members so all

[PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more

[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_

[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc

[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of contin

[PATCH v3 0/8] DRM scheduler fixes and improvements

2024-09-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Re-spin of the series from last week. Changelog is in individual patches. Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Tvrtko Ursulin (8): drm/sched: Add locking to drm_sched_entity_modify_sched drm/sched: Always wake

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-12 Thread Tvrtko Ursulin
On 10/09/2024 11:25, Philipp Stanner wrote: On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactor

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-11 Thread Tvrtko Ursulin
On 10/09/2024 11:05, Philipp Stanner wrote: On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. I'd prefer if commit messages follo

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-11 Thread Tvrtko Ursulin
On 10/09/2024 10:08, Christian König wrote: Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we ca

Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-10 Thread Tvrtko Ursulin
On 09/09/2024 13:46, Philipp Stanner wrote: On Mon, 2024-09-09 at 13:37 +0100, Tvrtko Ursulin wrote: On 09/09/2024 13:18, Christian König wrote: Am 09.09.24 um 14:13 schrieb Philipp Stanner: On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote: Am 09.09.24 um 11:44 schrieb Philipp

[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov

[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_

[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of contin

[PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian

[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * I

[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- include/drm

[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers

[PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more

[PATCH v2 0/8] DRM scheduler fixes, or not, or incorrect kind

2024-09-09 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Re-spin of the series from two days ago with review feedback addressed and some new patches added. Changelog is in individual patches but essentially new patches are renames and struct members re-ordering as discussed in v1, plus one more optimisation when I noticed we can

Re: [PATCH 1/2] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-09-09 Thread Tvrtko Ursulin
On 06/09/2024 19:12, Alex Deucher wrote: On Wed, Sep 4, 2024 at 4:36 AM Tvrtko Ursulin wrote: On 21/08/2024 21:47, Alex Deucher wrote: On Tue, Aug 13, 2024 at 9:57 AM Tvrtko Ursulin wrote: From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other

Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-09 Thread Tvrtko Ursulin
On 09/09/2024 13:18, Christian König wrote: Am 09.09.24 um 14:13 schrieb Philipp Stanner: On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote: Am 09.09.24 um 11:44 schrieb Philipp Stanner: On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Without the

Re: [RFC 0/4] DRM scheduler fixes, or not, or incorrect kind

2024-09-09 Thread Tvrtko Ursulin
On 09/09/2024 09:47, Philipp Stanner wrote: Hi, On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin In a recent conversation with Christian there was a thought that drm_sched_entity_modify_sched() should start using the entity- rq_lock to be safe against job

[RFC 2/2] drm/sched: Remove drm_sched_entity_set_priority

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Now that no callers exist, lets remove the whole misleading helper. Misleading because runtime changes do not reliably work due drm_sched_entity_select_rq() only acting on idle entities. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov

[RFC 1/2] drm/amdgpu: Remove dynamic DRM scheduling priority override

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin According to Christian the dynamic DRM priority override was only interesting before the hardware priority (dona via drm_sched_entity_modify_sched()) existed. Furthermore, both overrides also only work somewhat on paper while in reality they are only effective if the entity

[RFC 0/2] drm/amdgpu: No need for dynamic DRM priority?

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin In a recent conversation with Christian there was a thought that dynamic DRM scheduling priority changes are not required, or even not desired (actively prevented?!), and can be ripped out. For more context, starting point for that conversation was me observing that they

[RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Without the locking amdgpu currently can race amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. The comment on drm_sched_entity_modify_sched() howeve

[RFC 0/4] DRM scheduler fixes, or not, or incorrect kind

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin In a recent conversation with Christian there was a thought that drm_sched_entity_modify_sched() should start using the entity->rq_lock to be safe against job submission and simultaneous priority changes. The kerneldoc accompanying that function however is a bit unclear

[RFC 4/4] drm/sched: Optimise drm_sched_entity_push_job

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost --- drivers/gpu/drm/scheduler

[RFC 2/4] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue lets make sure to only derefernce the pointer once so both adding and waking up are guaranteed to be consistent. Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a fun

[RFC 3/4] drm/sched: Always increment correct scheduler score

2024-09-06 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian

Re: [PATCH 1/2] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-09-05 Thread Tvrtko Ursulin
On 21/08/2024 21:47, Alex Deucher wrote: On Tue, Aug 13, 2024 at 9:57 AM Tvrtko Ursulin wrote: From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the

Re: [PATCH 1/2] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-08-15 Thread Tvrtko Ursulin
On 13/08/2024 19:47, Rob Clark wrote: On Tue, Aug 13, 2024 at 6:57 AM Tvrtko Ursulin wrote: From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the current

Re: [PATCH] drm/amdgpu: Remove hidden double memset from amdgpu_vm_pt_clear()

2024-08-14 Thread Tvrtko Ursulin
On 13/08/2024 15:08, Tvrtko Ursulin wrote: From: Tvrtko Ursulin When CONFIG_INIT_STACK_ALL_ZERO is set and so -ftrivial-auto-var-init=zero compiler option active, compiler fails to notice that later in amdgpu_vm_pt_clear() there is a second memset to clear the same on stack struct

Re: [PATCH 6/6] drm/amdgpu: Re-validate evicted buffers v2

2024-08-13 Thread Tvrtko Ursulin
I was waiting for some replies elsewhere on this thread. Anwyay.. for the below, because I don't understand how come an important fix like this is not garnering more attention: On 04/06/2024 17:05, Christian König wrote: From: Tvrtko Ursulin Since you pretty much changed my logi

[PATCH] drm/amdgpu: Remove hidden double memset from amdgpu_vm_pt_clear()

2024-08-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin When CONFIG_INIT_STACK_ALL_ZERO is set and so -ftrivial-auto-var-init=zero compiler option active, compiler fails to notice that later in amdgpu_vm_pt_clear() there is a second memset to clear the same on stack struct amdgpu_vm_update_params. If we replace this memset with

[PATCH] drm/amdgpu: Remove hidden double memset from amdgpu_cs_ioctl()

2024-08-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin When CONFIG_INIT_STACK_ALL_ZERO is set and so -ftrivial-auto-var-init=zero compiler option active, compiler fails to notice that inside amdgpu_cs_parser_init() there is a second memset to clear the same on stack struct amdgpu_cs_parser. If we pull this memset one level out

[PATCH 0/2] DRM fdinfo legacy drm-memory- clarification and amdgpu update

2024-08-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Re-sending these two since they garnered little attention last time round. First patch clarifies what drm-memory- is, and that it is legacy, and second patch updates amdgpu to start emitting new keys together with the legacy (by using the common DRM helper). With that

[PATCH 2/2] drm/amdgpu: Use drm_print_memory_stats helper from fdinfo

2024-08-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Convert fdinfo memory stats to use the common drm_print_memory_stats helper. This achieves alignment with the common keys as documented in drm-usage-stats.rst, adding specifically drm-total- key the driver was missing until now. Additionally I made the code stop skipping

[PATCH 1/2] Documentation/gpu: Document the situation with unqualified drm-memory-

2024-08-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Currently it is not well defined what is drm-memory- compared to other categories. In practice the only driver which emits these keys is amdgpu and in them exposes the current resident buffer object memory (including shared). To prevent any confusion, document that drm

Re: [PATCH] drm/amdgpu: optimize the padding with hw optimization

2024-08-07 Thread Tvrtko Ursulin
On 04/08/2024 19:11, Marek Olšák wrote: On Thu, Aug 1, 2024 at 2:55 PM Marek Olšák wrote: On Thu, Aug 1, 2024, 03:37 Christian König wrote: Am 01.08.24 um 08:53 schrieb Marek Olšák: On Thu, Aug 1, 2024, 00:28 Khatri, Sunil wrote: On 8/1/2024 8:49 AM, Marek Olšák wrote: + /* He

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-27 Thread Tvrtko Ursulin
On 24/07/2024 12:16, Christian König wrote: Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin: [SNIP] Absolutely. Absolutely good and absolutely me, or absolutely you? :) You, I don't even have time to finish all the stuff I already started :/ Okay, I think I can squeeze it in. Thes

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-24 Thread Tvrtko Ursulin
On 22/07/2024 16:13, Christian König wrote: Am 22.07.24 um 16:43 schrieb Tvrtko Ursulin: On 22/07/2024 15:06, Christian König wrote: Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-23 Thread Tvrtko Ursulin
On 22/07/2024 15:06, Christian König wrote: Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 (&quo

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-23 Thread Tvrtko Ursulin
On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority c

[PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. The commit reduced drm_sched_entity_set_priority() to simply update the entities pri

[PATCH v4 2/3] drm/amdgpu: More efficient ring padding

2024-07-15 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Having noticed that typically 200+ nops per submission are written into the ring, using a rather verbose one-nop-at-a-time-plus-ring-buffer- arithmetic as done in amdgpu_ring_write(), the obvious idea was to improve it by filling those nops in blocks. This patch therefore

Re: [PATCH 2/3] drm/amdgpu: More efficient ring padding

2024-07-13 Thread Tvrtko Ursulin
On 12/07/2024 16:28, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having noticed that typically 200+ nops per submission are written into the ring, using a rather verbose one-nop-at-a-time-plus-ring-buffer- arithmetic as done in amdgpu_ring_write(), the obvious idea was to improve it by

Re: [RFC] drm/amdgpu: More efficient ring padding

2024-07-13 Thread Tvrtko Ursulin
On 12/07/2024 14:04, Christian König wrote: Am 12.07.24 um 11:14 schrieb Tvrtko Ursulin: On 12/07/2024 08:33, Christian König wrote: Am 11.07.24 um 20:17 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin  From the department of questionable optimisations today we have a minor improvement to

Re: [RFC] drm/amdgpu: More efficient ring padding

2024-07-13 Thread Tvrtko Ursulin
On 12/07/2024 08:33, Christian König wrote: Am 11.07.24 um 20:17 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin  From the department of questionable optimisations today we have a minor improvement to how padding / filling the rings with nops is done. Having noticed that typically 200+ nops

[PATCH v3 2/3] drm/amdgpu: More efficient ring padding

2024-07-12 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Having noticed that typically 200+ nops per submission are written into the ring, using a rather verbose one-nop-at-a-time-plus-ring-buffer- arithmetic as done in amdgpu_ring_write(), the obvious idea was to improve it by filling those nops in blocks. This patch therefore

[PATCH 3/3] drm/amdpug: More more efficient ring padding

2024-07-12 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Similarly as in the previous patch, we add a new amdgpu_ring_fill64() helper which can write out the nops more efficiently using memset64(). This should have a lesser effect than the previous patch, given how the affected rings have at most 64 dword alignment restriction

[PATCH 0/3] Ring commit and padding micro-optimisations

2024-07-12 Thread Tvrtko Ursulin
From: Tvrtko Ursulin A three patches to streamline the ring nop padding process which happens on every submission. I smoke tested graphics and video decode on the Steam Deck but cannot do much more testing than that. Therefore no guarantees I did not break something. Cc: Christian König

  1   2   3   >