Looks you can drop macro 'AMDGPU_DEFAULT_GTT_SIZE_MB' as well.
Regards,
Guchun
> -Original Message-
> From: amd-gfx On Behalf Of
> Mukul Joshi
> Sent: Wednesday, April 26, 2023 9:53 AM
> To: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> Cc: Joshi, Mukul ; Kuehling,
[AMD Official Use Only - General]
Reviewed-by: Le Ma
> -Original Message-
> From: Chen, Guchun
> Sent: Wednesday, April 26, 2023 11:12 AM
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Zhang, Hawking ;
> Ma, Le
> Cc: Chen, Guchun ; kernel test robot
> Subject: [PATCH]
Instead of the default one, check the right mqd_backup object.
Signed-off-by: Guchun Chen
Cc: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
This should be a typo when intruducing multi-xx support.
Reported-by: kernel test robot
Signed-off-by: Guchun Chen
Cc: Le Ma
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
On Tue, Apr 25, 2023 at 7:27 PM Chen, Guchun wrote:
>
> From coding style's perspective, this lock/unlock handling should be put into
> amdgpu_ctx_priority_override.
The locking is to protect mgr->ctx_handles.
>
> Regards,
> Guchun
>
> > -Original Message-
> > From: amd-gfx On Behalf Of
>From coding style's perspective, this lock/unlock handling should be put into
>amdgpu_ctx_priority_override.
Regards,
Guchun
> -Original Message-
> From: amd-gfx On Behalf Of Chia-
> I Wu
> Sent: Wednesday, April 26, 2023 8:48 AM
> To: dri-de...@lists.freedesktop.org
> Cc: Pan, Xinhui
tree/branch:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: f600e0bbde8562a06bee31b3eb1b69d49acac4c5 Add linux-next specific
files for 20230425
Error/Warning reports:
https://lore.kernel.org/oe-kbuild-all/202304102354.q4voxgte-...@intel.com
https
After reviewing this whole history, maybe attached patch is able to fix your
problem. Can you have a try please?
Regards,
Guchun
> -Original Message-
> From: amd-gfx On Behalf Of
> Mikhail Gavrilov
> Sent: Tuesday, April 25, 2023 9:20 PM
> To: Koenig, Christian
> Cc: Daniel Vetter ;
Use the helper function in TTM to get TTM memory
limit and set KFD's internal mem limit. This ensures
that KFD's TTM mem limit and actual TTM mem limit are
exactly same.
Signed-off-by: Mukul Joshi
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 3 ++-
Use the helper function in TTM to get TTM mem limit and
set GTT size to be equal to TTL mem limit.
Signed-off-by: Mukul Joshi
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 25 ++---
1 file changed, 6 insertions(+), 19 deletions(-)
diff --git
Add a helper function to get TTM memory limit. This is
needed by KFD to set its own internal memory limits.
Signed-off-by: Mukul Joshi
---
drivers/gpu/drm/ttm/ttm_tt.c | 6 ++
include/drm/ttm/ttm_tt.h | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git
Signed-off-by: Chia-I Wu
Cc: sta...@vger.kernel.org
---
drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index e9b45089a28a6..863b2a34b2d64
[Public]
> CSA, GDS backup, and shadow are allocated by userspace now.
Noted Alex, thanks. I will update the patch series and userspace accordingly.
Regards
Shashank
-Original Message-
From: Deucher, Alexander
Sent: 25 April 2023 19:38
To: Sharma, Shashank ; Koenig, Christian
;
On 24/04/2023 21:56, Felix Kuehling wrote:
On 2023-04-22 2:39, Shashank Sharma wrote:
- KFD process level doorbells: doorbell pages which are allocated by
kernel but mapped and written by userspace processes, saved in struct
pdd->qpd->doorbells
size = kfd_doorbell_process_slice.
We
The last 3 comments in this thread contain arguments that are false and
were specifically pointed out as false 6 comments ago: Soft resets are just
as fatal as hard resets. There is nothing better about soft resets. If the
VRAM is lost completely, that's a different story, and if the hard reset is
On Tue, Apr 25, 2023 at 2:52 PM Srinivasan Shanmugam
wrote:
>
> Fix following checkpatch errors in amdgpu_display.c
>
> ERROR: spaces required around that '=' (ctx:VxW)
> ERROR: that open brace { should be on the previous line
> ERROR: else should follow close brace '}'
>
> Cc: Christian König
>
Fix following checkpatch errors in amdgpu_display.c
ERROR: spaces required around that '=' (ctx:VxW)
ERROR: that open brace { should be on the previous line
ERROR: else should follow close brace '}'
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
---
[Public]
> -Original Message-
> From: Sharma, Shashank
> Sent: Tuesday, April 25, 2023 9:13 AM
> To: Koenig, Christian ; amd-
> g...@lists.freedesktop.org
> Cc: Yadav, Arvind ; Pelloux-Prayer, Pierre-Eric
> ;
> contactshashanksha...@gmail.com; Deucher, Alexander
>
> Subject: Re: [PATCH
On 25/04/2023 15:45, Christian König wrote:
Am 25.04.23 um 15:10 schrieb Shashank Sharma:
On 25/04/2023 14:27, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between
On 25/04/2023 17:33, Christian König wrote:
Am 25.04.23 um 15:31 schrieb Shashank Sharma:
On 25/04/2023 14:34, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this
Am 25.04.23 um 16:54 schrieb Shane Xiao:
[Why]
The selfring doorbell aperture will change when resize FB
BAR successfully during gmc sw init, we should reorder
the sequence of enabling doorbell selfring aperture.
[How]
Move enable_doorbell_selfring_aperture from *_common_hw_init
to
Am 25.04.23 um 15:31 schrieb Shashank Sharma:
On 25/04/2023 14:34, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would be considered
Comparisons of 'table' and 'vddc_sclk_table' with NULL are useless
since 'table' and 'vddc_sclk_table' are initialized by an addresses
and cannot be NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Igor Artemiev
---
drivers/gpu/drm/radeon/kv_dpm.c | 22
[Why]
The selfring doorbell aperture will change when resize FB
BAR successfully during gmc sw init, we should reorder
the sequence of enabling doorbell selfring aperture.
[How]
Move enable_doorbell_selfring_aperture from *_common_hw_init
to *_common_late_init.
This fixes the potential issue
Hi Christian,
What do you think about Felix's explanation?
Regards,
Eric
On 2023-04-13 09:28, Felix Kuehling wrote:
Am 2023-04-13 um 07:35 schrieb Christian König:
Am 13.04.23 um 03:01 schrieb Felix Kuehling:
Am 2023-04-12 um 18:25 schrieb Eric Huang:
It is to avoid redundant eviction for
Am 25.04.23 um 15:10 schrieb Shashank Sharma:
On 25/04/2023 14:27, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN
On 25/04/2023 14:40, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.
Cc: Alex Deucher
Cc: Christian Koenig
Suggested-by:
On 25/04/2023 14:38, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
The userspace sends us the doorbell object and the doobell index
to be used for the usermode queue, but the FW expects the absolute
doorbell index on the PCI BAR in the MQD. This patch adds a function
to
On 25/04/2023 14:36, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.
V4: fix the wptr value before mapping
On 25/04/2023 14:34, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would be considered ready to accept the workload.
V1: Addressed
On 25/04/2023 14:32, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
From: Arvind Yadav
This patch sets new parameters defined in v11_struct for
usermode queue with the respective allocated gpu context
space addresses.
V4: Split the patch into two parts, keep only the
On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov
wrote:
>
> Important don't give up.
> https://youtu.be/25zhHBGIHJ8 [40 min]
> https://youtu.be/utnDR26eYBY [50 min]
> https://youtu.be/DJQ_tiimW6g [12 min]
> https://youtu.be/Y6AH1oJKivA [6 min]
> Yes the issue is everything reproducible, but time
[AMD Official Use Only - General]
> -Original Message-
> From: Koenig, Christian
> Sent: Tuesday, April 25, 2023 7:24 PM
> To: Xiao, Shane ; amd-gfx@lists.freedesktop.org;
> Deucher, Alexander ; Kuehling, Felix
> ; Zhang, Hawking
> Cc: Liu, Aaron ; Hou, Xiaomeng (Matthew)
>
>
On 25/04/2023 14:30, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
The FW expects us to allocate at least one page as context
space to process gang, process, GDS and FW related work.
This patch creates a joint object for the same, and calculates
GPU space offsets for
On 25/04/2023 14:27, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.
This patch:
-
Am 25.04.23 um 14:14 schrieb Michel Dänzer:
On 4/25/23 14:08, Christian König wrote:
Well signaling that something happened is not the question. We do this for both
soft as well as hard resets.
The question is if errors result in blocking further submissions with the same
context or not.
In
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.
Cc: Alex Deucher
Cc: Christian Koenig
Suggested-by: Bas Nieuwenhuizen
Signed-off-by: Bas
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
The userspace sends us the doorbell object and the doobell index
to be used for the usermode queue, but the FW expects the absolute
doorbell index on the PCI BAR in the MQD. This patch adds a function
to convert this relative doorbell index to the
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.
V4: fix the wptr value before mapping lookup (Bas, Christian).
Cc: Alex Deucher
Cc:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would be considered ready to accept the workload.
V1: Addressed review comments from Alex on the RFC patch
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
From: Arvind Yadav
This patch sets new parameters defined in v11_struct for
usermode queue with the respective allocated gpu context
space addresses.
V4: Split the patch into two parts, keep only the set parameter
part in this series (Alex).
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
The FW expects us to allocate at least one page as context
space to process gang, process, GDS and FW related work.
This patch creates a joint object for the same, and calculates
GPU space offsets for each of these spaces.
V1: Addressed review
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.
This patch:
- Introduces MQD handler functions for the usermode
On 25/04/2023 14:14, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.
V1: Worked on review
Thanks for the review, Christian.
Inline,
On 25/04/2023 14:03, Christian König wrote:
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
From: Shashank Sharma
This patch adds skeleton code for amdgpu usermode queue. It contains:
- A new files with init functions of usermode queues.
- A queue
On 4/25/23 14:08, Christian König wrote:
> Well signaling that something happened is not the question. We do this for
> both soft as well as hard resets.
>
> The question is if errors result in blocking further submissions with the
> same context or not.
>
> In case of a hard reset and
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.
V1: Worked on review comments from RFC patch series:
- Alex: Keep a
Well signaling that something happened is not the question. We do this
for both soft as well as hard resets.
The question is if errors result in blocking further submissions with
the same context or not.
In case of a hard reset and potential loss of state we have to kill the
context,
Am 24.04.23 um 19:38 schrieb Shashank Sharma:
From: Shashank Sharma
This patch adds skeleton code for amdgpu usermode queue. It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.
V1: Worked on design review comments from RFC
Am 25.04.23 um 12:16 schrieb Shane Xiao:
[Why]
The selfring doorbell aperture will change when resize FB
BAR successfully during gmc sw init, we should reorder
the sequence of enabling doorbell selfring aperture.
[How]
Move enable_doorbell_selfring_aperture from *_common_hw_init
to
That supposedly depends on the compositor. There may be compositors for
very specific cases (e.g. Steam Deck) that handle resets very well, and
those would like to be properly notified of all resets because that's how
they get the best outcome, e.g. no corruption. A soft reset that is
unhandled by
On 4/24/23 18:45, Marek Olšák wrote:
> Soft resets are fatal just as hard resets, but no reset is "always fatal".
> There are cases when apps keep working depending on which features are being
> used. It's still unsafe.
Agreed, in theory.
In practice, from a user PoV, right now there's pretty
[Why]
The selfring doorbell aperture will change when resize FB
BAR successfully during gmc sw init, we should reorder
the sequence of enabling doorbell selfring aperture.
[How]
Move enable_doorbell_selfring_aperture from *_common_hw_init
to *_common_late_init.
This fixes the potential issue
The intent here is to clear the 'available_slices' buffer before setting
some values in it.
This is an array of int, so in order to fully initialize it, we must clear
MIN_AVAILABLE_SLICES_SIZE * sizeof(int) bytes.
Compute the right length of the buffer when calling memset().
Fixes: 97bda0322b8a
[AMD Official Use Only - General]
> -Original Message-
> From: Christian König
> Sent: Tuesday, April 25, 2023 4:54 PM
> To: Xiao, Shane ; Alex Deucher
> ; Koenig, Christian
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Zhang, Hawking
> ; Kuehling, Felix ; Hou,
>
I guess it's more simple if updating the subject to "drm/amdgpu: drop
gfx_v11_0_cp_ecc_error_irq_funcs"
With this improved, the series are:
Reviewed-by: Guchun Chen
Regards,
Guchun
> -Original Message-
> From: Horatio Zhang
> Sent: Tuesday, April 25, 2023 2:39 PM
> To: Zhang, Hawking
Am 25.04.23 um 08:38 schrieb Horatio Zhang:
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still
use amdgpu_irq_put to disable this interrupt, which caused the call trace
in this function.
[ 102.873958] Call Trace:
[ 102.873959]
[ 102.873961]
Am 25.04.23 um 05:29 schrieb Xiao, Shane:
[AMD Official Use Only - General]
-Original Message-
From: Liu, Aaron
Sent: Tuesday, April 25, 2023 9:14 AM
To: Alex Deucher ; Koenig, Christian
Cc: Xiao, Shane ; Christian König
; amd-gfx@lists.freedesktop.org;
Deucher, Alexander ;
[AMD Official Use Only - General]
Series is
Reviewed-by: Hawking Zhang
Regards,
Hawking
From: Horatio Zhang
Date: Tuesday, April 25, 2023 at 14:38
To: Zhang, Hawking , Koenig, Christian
, amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun , Xu, Feifei , Yao,
Longlong , Zhang, Horatio
[AMD Official Use Only - General]
Hi Hawking,
Thank you for such a quick review, I think your suggestion is a better choice
for solving this issue.
I have updated the fix patch about this issue, could you please help me review
it
[PATCH v2 1/3] drm/amdgpu: gfx_v11_0_cp_ecc_error_irq_funcs is
The gmc.ecc_irq is enabled by firmware per IFWI setting,
and the host driver is not privileged to enable/disable
the interrupt. So, it is meaningless to use the amdgpu_irq_put
function in gmc_v10_0_hw_fini, which also leads to the call
trace.
[ 82.340264] Call Trace:
[ 82.340265]
[
The gmc.ecc_irq is enabled by firmware per IFWI setting,
and the host driver is not privileged to enable/disable
the interrupt. So, it is meaningless to use the amdgpu_irq_put
function in gmc_v11_0_hw_fini, which also leads to the call
trace.
[ 102.980303] Call Trace:
[ 102.980303]
[
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still
use amdgpu_irq_put to disable this interrupt, which caused the call trace
in this function.
[ 102.873958] Call Trace:
[ 102.873959]
[ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu]
[ 102.874019]
[AMD Official Use Only - General]
Hi Christian,
Thank you for such a quick review.
It seems that the gfx_v11_0_cp_ecc_error_irq_funcs and amdgpu_irq_put(adev,
>gmc.ecc_irq, 0) in gmc_v11_0_hw_fini are not needed anymore.
I will update this fix in the next version of the patch.
Thanks,
Horatio
tree/branch:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 3b85b9b39960c08f29fa91b8d984d055dde6017e Add linux-next specific
files for 20230424
Error/Warning reports:
https://lore.kernel.org/oe-kbuild-all/202304102354.q4voxgte-...@intel.com
65 matches
Mail list logo