Prefix RAS message printing in gfx/mmhub with PCI device info,
which assists the debug in multiple GPU case.
Change-Id: Iceba7cafd5aac7d0251d9f871503745cc617fba2
Signed-off-by: Dennis Li
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c
old mode 100644
n
[AMD Official Use Only - Internal Distribution Only]
Reviewed-by: Dennis Li
-Original Message-
From: amd-gfx On Behalf Of Yong Zhao
Sent: Saturday, April 18, 2020 5:46 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong
Subject: [PATCH] drm/amdgpu: Print CU information by default durin
[AMD Public Use]
Now I understand what you mean by stack overflow. Thank you for the link. I
didn't know about the kernel stack size of a thread. Learn something again
today :)
Regards,
Amber
-Original Message-
From: Kuehling, Felix
Sent: Friday, April 17, 2020 10:19 PM
To: Lin, Ambe
Am 2020-04-17 um 9:48 p.m. schrieb Amber Lin:
>
>
> On 2020-04-17 6:31 p.m., Felix Kuehling wrote:
>> Am 2020-04-17 um 4:07 p.m. schrieb Amber Lin:
>>> When the compute is malfunctioning or performance drops, the system
>>> admin
>>> will use SMI (System Management Interface) tool to
>>> monitor/di
On 2020-04-17 6:31 p.m., Felix Kuehling wrote:
Am 2020-04-17 um 4:07 p.m. schrieb Amber Lin:
When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch inter
Hi,
This is needed for displayable DCC on gfx10. Mesa will add the first flag
soon or after DAL starts using it on gfx10.
Please review.
Thanks,
Marek
From b0896b2dac65ce08ee8bfa3161b28cfc813b3a1f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?=
Date: Fri, 17 Apr 2020 20:50:30
Am 2020-04-17 um 6:54 p.m. schrieb Joseph Greathouse:
> In order to surface the ASIC revision to user level, we want
> to put it into the HSA topology. This can be because different
> ASIC revisions may require user-level software to do different
> things (e.g. patch code for things that are change
In order to surface the ASIC revision to user level, we want
to put it into the HSA topology. This can be because different
ASIC revisions may require user-level software to do different
things (e.g. patch code for things that are changed in later
hardware revisions).
The ASIC revision from the ha
Am 2020-04-17 um 4:07 p.m. schrieb Amber Lin:
> When the compute is malfunctioning or performance drops, the system admin
> will use SMI (System Management Interface) tool to monitor/diagnostic what
> went wrong. This patch provides an event watch interface for the user
> space to register devices
Reviewed-by: Lyude Paul
In the future btw, you should use the DRM maintainer tools to add a Fixed-by
tag, since this:
Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST
atomic check")
Also so it gets cc'd to stable, I'll fixup the patch and push it. Thanks!
On Tue, 2020-0
This is convenient for multiple teams to obtain the information. Also,
add device info by using dev_info().
Signed-off-by: Yong Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/
Hi,
Wyatt made the below patch for fixing this issue. I can apply it on top
of this patchset if you all agree.
[Why]
Current code does not guarantee the correct endianness of memory being
copied to fw, specifically in the case where cpu isn't little endian.
[How]
Windows and Diags are always lit
On Fri, Apr 17, 2020 at 4:45 PM Yong Zhao wrote:
>
> This is convenient for multiple teams to obtain the information.
>
> Signed-off-by: Yong Zhao
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgp
Patches 1, 2 are:
Reviewed-by: Alex Deucher
On Fri, Apr 17, 2020 at 4:45 PM Yong Zhao wrote:
>
> Delete two printings which are not very useful, and change one from
> pr_info() to pr_debug().
>
> Signed-off-by: Yong Zhao
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
> drivers/gpu/d
This is convenient for multiple teams to obtain the information.
Signed-off-by: Yong Zhao
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
in
Delete two printings which are not very useful, and change one from
pr_info() to pr_debug().
Signed-off-by: Yong Zhao
---
drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 --
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/
Add more detail while turning off the printing by default, because it
is very useful.
Signed-off-by: Yong Zhao
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdg
On 04/09, Peter Zijlstra wrote:
> On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote:
> > Am 09.04.20 um 19:09 schrieb Peter Zijlstra:
> > > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> > > [SNIP]
> > > > I'll need another approach, let me consider.
> > > Christian
When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch interface for the user
space to register devices and subscribe events they are interested. After
regist
+ Joseph
Hi Joseph,
Would you like to help me review this change? This was a follow-up on the
discussion we had earlier this year.
Thanks,
Zhan
> -Original Message-
> From: Liu, Zhan
> Sent: 2020/April/16, Thursday 3:24 PM
> To: amd-gfx@lists.freedesktop.org; Liu, Zhan
> Subject:
On 2020-04-17 8:09 a.m., Christian König wrote:
> Am 17.04.20 um 12:43 schrieb Michel Dänzer:
>> On 2020-04-17 11:22 a.m., Christian König wrote:
>>> Agreed, just wanted to reply as well since I think something is not
>>> correctly understood here.
>>>
>>> The cpu_to_be16() and be16_to_cpu() functi
Am 2020-04-17 um 2:53 a.m. schrieb Yintian Tao:
> According to the current kiq read register method,
> there will be race condition when using KIQ to read
> register if multiple clients want to read at same time
> just like the expample below:
> 1. client-A start to read REG-0 throguh KIQ
> 2. clie
On 4/15/20 11:31 PM, Christoph Hellwig wrote:
> Hi all,
>
> this series improves the use_mm / unuse_mm interface by better
> documenting the assumptions, and my taking the set_fs manipulations
> spread over the callers into the core API.
>
> Changes since v1:
> - drop a few patches
> - fix a co
On Fri, Apr 17, 2020 at 9:16 AM YueHaibing wrote:
>
> drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_clock_source.c:1017:50:
> warning: ‘video_optimized_pixel_rates’ defined but not used
> [-Wunused-const-variable=]
> static const struct pixel_rate_range_table_entry
> video_optimized_pixel_r
On Fri, Apr 17, 2020 at 9:16 AM Jason Yan wrote:
>
> Fix the following gcc warning:
>
> drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/vega10_powertune.c:710:46:
> warning: ‘PSMGCEDCThresholdConfig_vega10’ defined but not used
> [-Wunused-const-variable=]
> static const struct vega10_didt_config_r
Fix the following gcc warning:
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/vega10_powertune.c:710:46:
warning: ‘PSMGCEDCThresholdConfig_vega10’ defined but not used
[-Wunused-const-variable=]
static const struct vega10_didt_config_reg
PSMGCEDCThresholdConfig_vega10[] =
^
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_clock_source.c:1017:50:
warning: ‘video_optimized_pixel_rates’ defined but not used
[-Wunused-const-variable=]
static const struct pixel_rate_range_table_entry video_optimized_pixel_rates[]
= {
^~
Am 17.04.20 um 12:43 schrieb Michel Dänzer:
On 2020-04-17 11:22 a.m., Christian König wrote:
Agreed, just wanted to reply as well since I think something is not
correctly understood here.
The cpu_to_be16() and be16_to_cpu() functions work different depending
on which architecture/endianess your
Hi Christian
Can you help give more details about how this spm trace works
After review the gfx_v9_0_update_spm_vmid function, I think it is some confused.
For example:
It is assumed that there are two gfx job which can be submitted to gfx ring.
When second gfx job is submitted, the vmid of
that breaks the device list in gpu recovery.
From: Pan, Xinhui
Sent: Friday, April 17, 2020 7:11:40 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org
; Zhang, Hawking ; Li,
Dennis ; Clements, John ; Koenig,
Christian
Subject: Re: [PATCH] drm/amdgpu: fix kerne
[AMD Official Use Only - Internal Distribution Only]
This patch shluld fix the panic.
but I would like you do NOT add adev xgmi head to the local device list. if ras
ue occurs while the gpu is already in gpu recovery.
From: amd-gfx on behalf of Christian
K?nig
On 2020-04-17 11:22 a.m., Christian König wrote:
> Agreed, just wanted to reply as well since I think something is not
> correctly understood here.
>
> The cpu_to_be16() and be16_to_cpu() functions work different depending
> on which architecture/endianess your are.
>
> So they should be a NO-OP
Hi Christian
mmRLC_SPM_MC_CNTL
this register is a RLC register, with my understanding it is PF&VF share
register, and I did experiment proved it:
1) write abc to it in PF
2) read it from VF, it shows abc
3) write ff to it in VF, read it, it is still abc
So this register with current policy (L1)
Agreed, just wanted to reply as well since I think something is not
correctly understood here.
The cpu_to_be16() and be16_to_cpu() functions work different depending
on which architecture/endianess your are.
So they should be a NO-OP on x86 if everything is done right.
Christian.
Am 17.04.2
Am 16.04.20 um 17:47 schrieb Guchun Chen:
When running ras uncorrectable error injection and trigger GPU
reset on sGPU, below issue is observed. It's caused by the list
uninitialized when accessing.
[ 80.047227] BUG: unable to handle page fault for address: c0f4f750
[ 80.047300] #PF:
Dynamic alloc each time doing KIQ reg read is a overkill to me
Yeah, that is a rather good argument.
Now we do KIQ read and write *every time* we do amdgpu_vm_flush (omg...
what's this ??)
That is updating the VMID used for the SPM trace. And yes this
read/modify/write is most likely not
Christian
>>
See we wanted to map the ring buffers read only and USWC for some time.
That would result in either not working driver or rather crappy performance.
<<
For KIQ the ring buffer wouldn't be read only ... should be cacheable type
Dynamic alloc each time doing KIQ reg read is a overki
Looks like a rather important bug fix to me, but I'm not sure if writing
the value into the ring buffer is a good idea.
See we wanted to map the ring buffers read only and USWC for some time.
That would result in either not working driver or rather crappy performance.
Can't we just call amdgp
The change Looks good with me, you can put my RB to your patch .
Since this patch impact on general logic (not SRIOV only) I would like you wait
a little longer for @Kuehling, Felix and @Deucher, Alexander and @Koenig,
Christian @Zhang, Hawking
If any of them gave you a RB I think we can go t
On Thu, Apr 16, 2020 at 08:17:44PM -0700, Matthew Wilcox wrote:
> On Thu, Apr 16, 2020 at 07:31:55AM +0200, Christoph Hellwig wrote:
> > this series improves the use_mm / unuse_mm interface by better
> > documenting the assumptions, and my taking the set_fs manipulations
> > spread over the callers
40 matches
Mail list logo