Am 03.05.23 um 21:14 schrieb André Almeida:
Em 03/05/2023 14:43, Timur Kristóf escreveu:
Hi Felix,
On Wed, 2023-05-03 at 11:08 -0400, Felix Kuehling wrote:
That's the worst-case scenario where you're debugging HW or FW
issues.
Those should be pretty rare post-bringup. But are there hangs cause
[AMD Official Use Only - General]
Hi Hawking,
Thank you for your review.
I will change the judgment criteria to if (adev->gfx.cp_error_irq.funcs), and
submit this patch to amd-staging-drm-next.
Regards,
Horatio
-Original Message-
From: Zhang, Hawking
Sent: Friday, April 28, 2023 1:
Hi Dave, Daniel,
Fixes for 6.4.
The following changes since commit d893f39320e1248d1c97fde0d6e51e5ea008a76b:
drm/amd/display: Lowering min Z8 residency time (2023-04-26 22:53:58 -0400)
are available in the Git repository at:
https://gitlab.freedesktop.org/agd5f/linux.git
tags/amd-drm-fixe
smatch warning -
1) drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:3615 gfx_v9_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.
2) drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:6901 gfx_v10_0_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.
Signed-off-by: Sukrut Be
On Wed, May 3, 2023, 14:53 André Almeida wrote:
> Em 03/05/2023 14:08, Marek Olšák escreveu:
> > GPU hangs are pretty common post-bringup. They are not common per user,
> > but if we gather all hangs from all users, we can have lots and lots of
> > them.
> >
> > GPU hangs are indeed not very debu
Em 03/05/2023 14:43, Timur Kristóf escreveu:
Hi Felix,
On Wed, 2023-05-03 at 11:08 -0400, Felix Kuehling wrote:
That's the worst-case scenario where you're debugging HW or FW
issues.
Those should be pretty rare post-bringup. But are there hangs caused
by
user mode driver or application bugs tha
Em 03/05/2023 14:08, Marek Olšák escreveu:
GPU hangs are pretty common post-bringup. They are not common per user,
but if we gather all hangs from all users, we can have lots and lots of
them.
GPU hangs are indeed not very debuggable. There are however some things
we can do:
- Identify the h
On 5/2/2023 11:51 AM, Hamza Mahfooz wrote:
As made mention of, in commit 9128e6babf10 ("drm/amdgpu: fix
amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit c094b8923bdd
("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It
is meaningless to call amdgpu_irq_put() for gmc
Hi Felix,
On Wed, 2023-05-03 at 11:08 -0400, Felix Kuehling wrote:
> That's the worst-case scenario where you're debugging HW or FW
> issues.
> Those should be pretty rare post-bringup. But are there hangs caused
> by
> user mode driver or application bugs that are easier to debug and
> probabl
WRITE_DATA with ENGINE=PFP will execute the packet on the frontend engine,
while ENGINE=ME will execute the packet on the backend engine.
Marek
On Wed, May 3, 2023 at 1:08 PM Marek Olšák wrote:
> GPU hangs are pretty common post-bringup. They are not common per user,
> but if we gather all hang
GPU hangs are pretty common post-bringup. They are not common per user, but
if we gather all hangs from all users, we can have lots and lots of them.
GPU hangs are indeed not very debuggable. There are however some things we
can do:
- Identify the hanging IB by its VA (the kernel should know it)
-
[AMD Official Use Only - General]
One minor comment inline.
-Original Message-
From: amd-gfx On Behalf Of Sreekant
Somasekharan
Sent: Friday, April 28, 2023 3:12 PM
To: amd-gfx@lists.freedesktop.org
Cc: Somasekharan, Sreekant
Subject: [PATCH v2] drm/amdkfd: Expose proc sysfs folder con
Applied. Thanks!
Alex
On Wed, May 3, 2023 at 11:29 AM Dan Carpenter wrote:
>
> Smatch complains that we need to drop this lock before returning.
>
> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1838 gfx_v9_4_3_kiq_resume()
> warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.
>
> Fixe
Applied. Thanks!
On Wed, May 3, 2023 at 11:29 AM Dan Carpenter wrote:
>
> We changed which lock we are supposed to take but this error path
> was accidentally over looked so it still drops the old lock.
>
> Fixes: def799c6596d ("drm/amdgpu: add multi-xcc support to amdgpu_gfx
> interfaces (v4)"
On 2023-05-03 17:31, Tvrtko Ursulin wrote:
On 03/05/2023 09:34, Maarten Lankhorst wrote:
Based roughly on the rdma and misc cgroup controllers, with a lot of
the accounting code borrowed from rdma.
The interface is simple:
- populate drmcgroup_device->regions[..] name and size for each activ
On 03/05/2023 09:34, Maarten Lankhorst wrote:
Based roughly on the rdma and misc cgroup controllers, with a lot of
the accounting code borrowed from rdma.
The interface is simple:
- populate drmcgroup_device->regions[..] name and size for each active
region.
- Call drm(m)cg_register_device(
Smatch complains that we need to drop this lock before returning.
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1838 gfx_v9_4_3_kiq_resume()
warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.
Fixes: 86301129698b ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0")
Signed-off-by: D
We changed which lock we are supposed to take but this error path
was accidentally over looked so it still drops the old lock.
Fixes: def799c6596d ("drm/amdgpu: add multi-xcc support to amdgpu_gfx
interfaces (v4)")
Signed-off-by: Dan Carpenter
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
Am 03.05.23 um 17:24 schrieb Alex Deucher:
On Wed, May 3, 2023 at 11:20 AM Christian König
wrote:
Reviewed-by: Christian König for this one.
Can't say much about the first one. That was just the hack because some
bit in the IP version was re-used on SRIOV, wasn't it?
Yes, the high 2 bits of
On Wed, May 3, 2023 at 11:20 AM Christian König
wrote:
>
> Reviewed-by: Christian König for this one.
>
> Can't say much about the first one. That was just the hack because some
> bit in the IP version was re-used on SRIOV, wasn't it?
Yes, the high 2 bits of the revision number were reused for a
Am 03.05.23 um 17:08 schrieb Felix Kuehling:
Am 2023-05-03 um 03:59 schrieb Christian König:
Am 02.05.23 um 20:41 schrieb Alex Deucher:
On Tue, May 2, 2023 at 11:22 AM Timur Kristóf
wrote:
[SNIP]
In my opinion, the correct solution to those problems would be
if
the kernel could give userspac
Reviewed-by: Christian König for this one.
Can't say much about the first one. That was just the hack because some
bit in the IP version was re-used on SRIOV, wasn't it?
Christian.
Am 03.05.23 um 17:02 schrieb Alex Deucher:
Ping?
On Thu, Apr 27, 2023 at 2:34 PM Alex Deucher wrote:
amdgpu
I suppose we have this information elsewhere.
Series is:
Reviewed-by: Luben Tuikov
Regards,
Luben
On 2023-05-03 11:02, Alex Deucher wrote:
> Ping?
>
> On Thu, Apr 27, 2023 at 2:34 PM Alex Deucher
> wrote:
>>
>> amdgpu_discovery_get_ip_version() has not been used since
>> commit c40bdfb2ffa4
Am 2023-05-03 um 03:59 schrieb Christian König:
Am 02.05.23 um 20:41 schrieb Alex Deucher:
On Tue, May 2, 2023 at 11:22 AM Timur Kristóf
wrote:
[SNIP]
In my opinion, the correct solution to those problems would be
if
the kernel could give userspace the necessary information about
a
GPU hang b
Ping?
On Thu, Apr 27, 2023 at 2:34 PM Alex Deucher wrote:
>
> amdgpu_discovery_get_ip_version() has not been used since
> commit c40bdfb2ffa4 ("drm/amdgpu: fix incorrect VCN revision in SRIOV")
> so drop it.
>
> Signed-off-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |
Ping?
On Thu, Apr 27, 2023 at 2:34 PM Alex Deucher wrote:
>
> This was already fixed and dropped in:
> commit baf3f8f37406 ("drm/amdgpu: handle SRIOV VCN revision parsing")
> commit c40bdfb2ffa4 ("drm/amdgpu: fix incorrect VCN revision in SRIOV")
> But seems to have been accidently been left arou
Reviewed-by: Luben Tuikov
Regards,
Luben
On 2023-05-01 10:55, Alex Deucher wrote:
> Ping?
>
> Alex
>
> On Fri, Apr 28, 2023 at 11:57 AM Alex Deucher
> wrote:
>>
>> Reduces preemption latency.
>> Only enable this for gfx10 and 11 for now
>> to avoid changing behavior on gfx 8 and 9.
>>
>> v2:
Hi, Maarten
On 5/3/23 10:34, Maarten Lankhorst wrote:
This allows the drm cgroup controller to return no space is available..
XXX: This is a hopeless simplification that changes behavior, and
returns -ENOSPC even if we could evict ourselves from the current
cgroup.
Ideally, the eviction code b
Once command submission failed due to userptr invalidation in
amdgpu_cs_submit, legacy code will perform cleanup of scheduler
job. However, it's not needed at all, as former commit has integrated
job cleanup stuff into amdgpu_job_free. Otherwise, because of double
free, a NULL pointer dereference w
On 2023-05-03 11:11, Thomas Hellström wrote:
Hi, Maarten
On 5/3/23 10:34, Maarten Lankhorst wrote:
This allows the drm cgroup controller to return no space is available..
XXX: This is a hopeless simplification that changes behavior, and
returns -ENOSPC even if we could evict ourselves from th
Am 03.05.23 um 11:00 schrieb Srinivasan Shanmugam:
The following checkpatch errors & warning is removed.
ERROR: else should follow close brace '}'
ERROR: trailing statements should be on next line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Possible repeated word: 'Fences'
The following checkpatch warning is removed.
WARNING: Possible unnecessary 'out of memory' message
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
---
drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drive
The following checkpatch errors & warning is removed.
ERROR: else should follow close brace '}'
ERROR: trailing statements should be on next line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Possible repeated word: 'Fences'
WARNING: Missing a blank line after declarations
WARN
Am 03.05.23 um 10:46 schrieb Srinivasan Shanmugam:
The following checkpatch errors & warning is removed.
ERROR: else should follow close brace '}'
ERROR: trailing statements should be on next line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Possible repeated word: 'Fences'
The following checkpatch errors & warning is removed.
ERROR: else should follow close brace '}'
ERROR: trailing statements should be on next line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Possible repeated word: 'Fences'
WARNING: Missing a blank line after declarations
WARN
Based roughly on the rdma and misc cgroup controllers, with a lot of
the accounting code borrowed from rdma.
The interface is simple:
- populate drmcgroup_device->regions[..] name and size for each active
region.
- Call drm(m)cg_register_device()
- Use drmcg_try_charge to check if you can alloca
Add some code to implement basic support for the vram0, vram1 and stolen
memory regions.
I fear the try_charge code should probably be done inside TTM. This
code should interact with the shrinker, but for a simple RFC it's good
enough.
Signed-off-by: Maarten Lankhorst
---
drivers/gpu/drm/xe/xe_
This allows the drm cgroup controller to return no space is available..
XXX: This is a hopeless simplification that changes behavior, and
returns -ENOSPC even if we could evict ourselves from the current
cgroup.
Ideally, the eviction code becomes cgroup aware, and will force eviction
from the cur
From: Tvrtko Ursulin
Skeleton controller without any functionality.
Signed-off-by: Tvrtko Ursulin
Signed-off-by: Maarten Lankhorst
---
include/linux/cgroup_drm.h| 9 ++
include/linux/cgroup_subsys.h | 4 +++
init/Kconfig | 7
kernel/cgroup/Makefile| 1
RFC as I'm looking for comments.
For long running compute, it can be beneficial to partition the GPU memory
between cgroups, so each cgroup can use its maximum amount of memory without
interfering with other scheduled jobs. Done properly, this can alleviate the
need for eviction, which might resul
Am 02.05.23 um 20:41 schrieb Alex Deucher:
On Tue, May 2, 2023 at 11:22 AM Timur Kristóf wrote:
[SNIP]
In my opinion, the correct solution to those problems would be
if
the kernel could give userspace the necessary information about
a
GPU hang before a GPU reset.
The fundamental problem he
41 matches
Mail list logo