Did find way to reproduce issue constantly. After applying David's patch
"0001-drm-amdgpu-fix-signaled-fence-isn-t-handled" with minor change
-static struct dma_fence *drm_syncobj_get_stub_fence(void)
+struct dma_fence *drm_syncobj_get_stub_fence(void)
was able to avoid kernel panic due to NUL
On Wed, Nov 28, 2018 at 07:46:06PM +, Ho, Kenny wrote:
>
> On Wed, Nov 28, 2018 at 4:14 AM Joonas Lahtinen
> wrote:
> > So we can only choose the lowest common denominator, right?
> >
> > Any core count out of total core count should translate nicely into a
> > fraction, so what would be the
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.
v2: Remove creation of dedicated wq for XGMI hive reset.
v3: Use the gmc.xgmi.supported flag
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +
No point in use mdelay unless running from interrupt context (which we are not)
This is busy wait which will block the CPU for the entirety of the wait time.
Also, reduce wait time to 500ms as it is done in refernce code because
1s might cause PSP FW TO issues during XGMI hive reset.
Signed-off-by
Use per hive wq to concurrently send reset commands to all nodes
in the hive.
v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop the hive reset for each node loop if there
is a reset failure on any of the nodes.
Signed-off-by: Andrey Grodzovs
The credit was used to limit vm (retry) fault to be processed in each VM. If
this is removed, it is possible that you get flooded interrupt storm.
Even though you claimed from the commit message that, printk_ratelimit is a
better solution, I didn't see you implement it in this patch. Are you pla
See comment [Oak]
Thanks,
Oak
-Original Message-
From: amd-gfx On Behalf Of Christian
König
Sent: Friday, November 30, 2018 7:36 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing
them v2
This allows us to filter out VM fa
Reviewed-by: Andrey Grodzovsky
Andrey
On 11/30/2018 03:36 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 3:34 PM Grodzovsky, Andrey
> wrote:
>>
>>
>> On 11/30/2018 03:30 PM, Alex Deucher wrote:
>>> Use this to track whether an asic supports xgmi rather than
>>> checking the asic type everyw
On Fri, Nov 30, 2018 at 3:34 PM Grodzovsky, Andrey
wrote:
>
>
>
> On 11/30/2018 03:30 PM, Alex Deucher wrote:
> > Use this to track whether an asic supports xgmi rather than
> > checking the asic type everywhere.
> >
> > Signed-off-by: Alex Deucher
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_
On 11/30/2018 03:30 PM, Alex Deucher wrote:
> Use this to track whether an asic supports xgmi rather than
> checking the asic type everywhere.
>
> Signed-off-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
> driver
Use this to track whether an asic supports xgmi rather than
checking the asic type everywhere.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +-
drivers/gpu/drm/amd/a
On Fri, Nov 30, 2018 at 3:12 PM Grodzovsky, Andrey
wrote:
>
>
> On 11/30/2018 03:08 PM, Alex Deucher wrote:
> > On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
> > wrote:
> >>
> >>
> >> On 11/30/2018 02:49 PM, Alex Deucher wrote:
> >>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> >>> w
On 11/30/2018 03:08 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
> wrote:
>>
>>
>> On 11/30/2018 02:49 PM, Alex Deucher wrote:
>>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
>>> wrote:
XGMI hive has some resources allocted on device init which
nee
On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
wrote:
>
>
>
> On 11/30/2018 02:49 PM, Alex Deucher wrote:
> > On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> > wrote:
> >> XGMI hive has some resources allocted on device init which
> >> needs to be deallocated when the device is unregister
On 11/30/2018 02:49 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> wrote:
>> XGMI hive has some resources allocted on device init which
>> needs to be deallocated when the device is unregistered.
>>
>> v2: Remove creation of dedicated wq for XGMI hive reset.
>>
>>
On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
wrote:
>
> XGMI hive has some resources allocted on device init which
> needs to be deallocated when the device is unregistered.
>
> v2: Remove creation of dedicated wq for XGMI hive reset.
>
> Signed-off-by: Andrey Grodzovsky
> ---
> drivers/gpu
Hi Dave,
More new features for 4.21:
amdgpu and amdkfd:
- Freesync support
- ABM support in DC
- KFD support for vega12 and polaris12
- Add sdma paging queue support for vega
- Use ACPI to query backlight range on supported platforms
- Clean up doorbell handling
- KFD fix for pasid handling under
Use per hive wq to concurrently send reset commands to all nodes
in the hive.
v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop the hive reset for each node loop if there
is a reset failure on any of the nodes.
Signed-off-by: Andrey Grodzovs
No point in use mdelay unless running from interrupt context (which we are not)
This is busy wait which will block the CPU for the entirety of the wait time.
Also, reduce wait time to 500ms as it is done in refernce code because
it might cause PSP FW timeout issues during XGMI hive reset.
Signed-o
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.
v2: Remove creation of dedicated wq for XGMI hive reset.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgp
Won't this break VM fault handling in KFD? I don't see a way with the current
code that you can leave some VM faults for KFD to process. If we could consider
VM faults with VMIDs 8-15 as not handled in amdgpu and leave them for KFD to
process, then this could work.
As far as I can tell, the onl
On 11/30/2018 10:53 AM, Koenig, Christian wrote:
> Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey:
>> On 11/30/2018 04:03 AM, Christian König wrote:
>>> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
XGMI hive has some resources allocted on device init which
needs to be deallocated wh
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> That should add back pressure on the client.
>
> Signed-off-by: Christian König
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amd
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> This finally enables processing of ring 1 & 2.
>
> Signed-off-by: Christian König
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 --
> 1 file changed, 63 insertions(+), 5 deletion
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> Previously we only added the ring buffer memory, now add the handling as
> well.
>
> Signed-off-by: Christian König
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 33 +
> drivers/gpu/d
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> Let's start to support multiple rings.
>
> v2: decode IV is needed as well
>
> Signed-off-by: Christian König
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 6 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 13 +++---
> drivers/gpu/
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> The GMC/VM subsystem is causing the faults, so move the handling here as
> well.
>
> Signed-off-by: Christian König
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq
Am 30.11.18 um 17:01 schrieb Alex Deucher:
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
The entries are ignored for now, but it at least stops crashing the
hardware when somebody tries to push something to the other IH rings.
v2: limit ring size, add TODO comment
Signed-off-by: Chri
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> printk_ratelimit() is much better suited to limit the number of reported
> VM faults.
>
> Signed-off-by: Christian König
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 37 -
> drivers/
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> This allows us to filter out VM faults in the GMC code.
>
> v2: don't filter out all faults
>
> Signed-off-by: Christian König
Acked-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++--
> 1
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> To distinct on which IH ring an IV was found.
>
> Signed-off-by: Christian König
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 4 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 11 +++
> 2
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> The entries are ignored for now, but it at least stops crashing the
> hardware when somebody tries to push something to the other IH rings.
>
> v2: limit ring size, add TODO comment
>
> Signed-off-by: Christian König
We may want to guard
Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey:
>
> On 11/30/2018 04:03 AM, Christian König wrote:
>> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
>>> XGMI hive has some resources allocted on device init which
>>> needs to be deallocated when the device is unregistered.
>>>
>>> Add per hive wq
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> Calculate all the addresses and pointers in amdgpu_ih.c
>
> Signed-off-by: Christian König
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 34 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ih.
On Fri, Nov 30, 2018 at 7:36 AM Christian König
wrote:
>
> We ignored the return code here.
>
> Signed-off-by: Christian König
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_
Reviewed-by: Alex Deucher
From: amd-gfx on behalf of Oak Zeng
Sent: Friday, November 30, 2018 10:39:21 AM
To: amd-gfx@lists.freedesktop.org
Cc: Yang, Philip; Zeng, Oak; Yin, Tianci (Rico)
Subject: [PATCH] drm/amdgpu: Fix num_doorbell calculation issue
When pag
When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not
On 2018-11-30 10:09 a.m., Nicholas Kazlauskas wrote:
> [Why]
> With scaling, underscan and abm changes we can end up calling
> commit_planes_to_stream in commit_tail. This call uses dm_state->context
> which can be NULL if the commit was a fast update.
>
> [How]
> Use dc_state instead since that c
On 2018-11-30 10:13 a.m., Deucher, Alexander wrote:
> Acked-by: Alex Deucher
Reviewed-by: Leo Li
>
>
> *From:* amd-gfx on behalf of
> Nicholas Kazlauskas
> *Sent:* Friday, November 30, 2018 10:09:28 AM
> *To:* amd-gf
On 11/30/2018 04:03 AM, Christian König wrote:
> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
>> XGMI hive has some resources allocted on device init which
>> needs to be deallocated when the device is unregistered.
>>
>> Add per hive wq to allow all the nodes in hive to run resets
>> concuren
Acked-by: Alex Deucher
From: amd-gfx on behalf of Nicholas
Kazlauskas
Sent: Friday, November 30, 2018 10:09:28 AM
To: amd-gfx@lists.freedesktop.org
Cc: Li, Sun peng (Leo); Wentland, Harry; Kazlauskas, Nicholas
Subject: [PATCH] drm/amd/display: Fix NULL ptr dere
[Why]
With scaling, underscan and abm changes we can end up calling
commit_planes_to_stream in commit_tail. This call uses dm_state->context
which can be NULL if the commit was a fast update.
[How]
Use dc_state instead since that can't be NULL unless the system ran
out of memory.
Bugzilla: https:
Reviewed-by: Alex Deucher
From: amd-gfx on behalf of Christian
König
Sent: Friday, November 30, 2018 7:45:17 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: remove amdgpu_bo_backup_to_shadow
It is unused.
Signed-off-by: Christian König
---
[Why]
Tracing is a useful and cheap debug functionality
[How]
This creates a new trace system amdgpu_dm, currently with
three trace events
amdgpu_dc_rreg and amdgpu_dc_wreg report the address and value
of any dc register reads and writes
amdgpu_dc_performance requires at least one of those two t
It is unused.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 47 --
drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 ---
2 files changed, 52 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/am
Let's start to support multiple rings.
v2: decode IV is needed as well
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 6 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 13 +++---
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 29 +++--
drivers/gpu/drm/amd/amdgpu
Previously we only added the ring buffer memory, now add the handling as
well.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 33 +
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 4 ++-
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git
Calculate all the addresses and pointers in amdgpu_ih.c
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 34 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 23 +---
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 9 +++
drivers/gpu/drm/am
That should add back pressure on the client.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index f5c5ea628fdf..dd7f52f08fd7 100644
The entries are ignored for now, but it at least stops crashing the
hardware when somebody tries to push something to the other IH rings.
v2: limit ring size, add TODO comment
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 4 +-
drivers/gpu/drm/amd/amdgpu/vega10_
This allows us to filter out VM faults in the GMC code.
v2: don't filter out all faults
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++--
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ir
This finally enables processing of ring 1 & 2.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 --
1 file changed, 63 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.
We ignored the return code here.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 3a4e5d8d5162..e329a23e1f99 100644
--- a/drivers/gp
The GMC/VM subsystem is causing the faults, so move the handling here as
well.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 2 -
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 4 --
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 13
drivers/gpu/drm/amd/amdgpu/cz_ih.c
printk_ratelimit() is much better suited to limit the number of reported
VM faults.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 37 -
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 5
drivers/gpu/drm/amd/amdgpu/cik_ih.c | 18 +
To distinct on which IH ring an IV was found.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 11 +++
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
Am 30.11.18 um 10:48 schrieb wentalou:
SWDEV-171843: KIQ in VF’s init delayed by another VF’s reset.
late_init failed occasionally if overlapped with another VF’s reset.
MAX_KIQ_REG_TRY enlarged from 20 to 80 would fix this issue.
Change-Id: I841774bdd9ebf125c5aa2046b1dcebd65e07
Signed-off-b
SWDEV-171843: KIQ in VF’s init delayed by another VF’s reset.
late_init failed occasionally if overlapped with another VF’s reset.
MAX_KIQ_REG_TRY enlarged from 20 to 80 would fix this issue.
Change-Id: I841774bdd9ebf125c5aa2046b1dcebd65e07
Signed-off-by: wentalou
---
drivers/gpu/drm/amd/amd
> -Original Message-
> From: Christian König
> Sent: Friday, November 30, 2018 5:15 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for
> amdgpu v3
>
[snip]
> >> +d
Am 30.11.18 um 08:35 schrieb zhoucm1:
On 2018年11月28日 22:50, Christian König wrote:
From: Chunming Zhou
v2: symbos are stored in lexical order.
v3: drop export/import and extra query indirection
Signed-off-by: Chunming Zhou
Signed-off-by: Christian König
---
amdgpu/amdgpu-symbol-check |
Am 30.11.18 um 03:00 schrieb Alex Deucher:
Looks like it was missed when setting support was added.
Signed-off-by: Alex Deucher
Reviewed-by: Christian König
---
This is a legit bug fix. the rest of this series needs more work.
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1 +
1 file chang
Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.
Add per hive wq to allow all the nodes in hive to run resets
concurently - this should speed up the total reset time to avoid
breach
62 matches
Mail list logo