Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-14 Thread Andrey Grodzovsky
On 2021-04-14 10:58 a.m., Christian König wrote: Am 14.04.21 um 16:36 schrieb Andrey Grodzovsky:  [SNIP] We are racing here once more and need to handle that. But why, I wrote above that we first stop the all schedulers, then only call drm_sched_entity_kill_jobs. The schedulers consumin

[PATCH] drm/amdgpu: use ratelimited print in sdma4 interrupt

2021-04-14 Thread Feifei Xu
dev_*_ratelimited printing will avoid dmesg flush. Signed-off-by: Feifei Xu --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index 06247ad4

RE: [PATCH] drm/amdgpu: Copy MEC FW version to MEC2 if we skipped loading MEC2

2021-04-14 Thread Zhang, Hawking
[AMD Public Use] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Joseph Greathouse Sent: Thursday, April 15, 2021 07:40 To: amd-gfx@lists.freedesktop.org Cc: Greathouse, Joseph Subject: [PATCH] drm/amdgpu: Copy MEC FW version to MEC2 if we ski

[PATCH v2 1/3] drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu

2021-04-14 Thread Felix Kuehling
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap to access the BO through the corresponding file descriptor. The VM can also be extracted from drm_priv, so drm_priv can replace the vm parameter in the kfd2kgd interface. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/am

[PATCH v2 2/3] drm/amdkfd: Allow access for mmapping KFD BOs

2021-04-14 Thread Felix Kuehling
DRM render node file handles are used for CPU mapping of BOs using mmap by the Thunk. It uses the DRM render node of the GPU where the BO was allocated. DRM allows mmap access automatically when it creates a GEM handle for a BO. KFD BOs don't have GEM handles, so KFD needs to manage access manuall

[PATCH v2 3/3] drm/amdgpu: Remove verify_access shortcut for KFD BOs

2021-04-14 Thread Felix Kuehling
This shortcut is no longer needed with access managed properly by KFD. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdg

[PATCH 34/34] drm/amdkfd: Add CONFIG_HSA_AMD_SVM

2021-04-14 Thread Felix Kuehling
Control whether to build SVM support into amdgpu with a Kconfig option. This makes it easier to disable it in production kernels if this new feature causes problems in production environments. Use "depends on" instead of "select" for DEVICE_PRIVATE, as is recommended for visible options. Reviewed

[PATCH 31/34] drm/amdkfd: add svm range validate timestamp

2021-04-14 Thread Felix Kuehling
With xnack on, add validate timestamp in order to handle GPU vm fault from multiple GPUs. If GPU retry fault need migrate the range to the best restore location, use range validate timestamp to record system timestamp after range is restored to update GPU page table. Because multiple pages of sam

[PATCH 33/34] drm/amdkfd: Add SVM API support capability bits

2021-04-14 Thread Felix Kuehling
From: Philip Yang SVMAPISupported property added to HSA_CAPABILITY, the value match HSA_CAPABILITY defined in Thunk spec: SVMAPISupported: it will not be supported on older kernels that don't have HMM or on systems with GFXv8 or older GPUs without support for 48-bit virtual addresses. CoherentH

[PATCH 30/34] drm/amdkfd: refine migration policy with xnack on

2021-04-14 Thread Felix Kuehling
With xnack on, GPU vm fault handler decide the best restore location, then migrate range to the best restore location and update GPU mapping to recover the GPU vm fault. Signed-off-by: Philip Yang Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers

[PATCH 32/34] drm/amdkfd: multiple gpu migrate vram to vram

2021-04-14 Thread Felix Kuehling
If prefetch range to gpu with acutal location is another gpu, or GPU retry fault restore pages to migrate the range with acutal location is gpu, then migrate from one gpu to another gpu. Use system memory as bridge because sdma engine may not able to access another gpu vram, use sdma of source gpu

[PATCH 25/34] drm/amdkfd: add svm_bo reference for eviction fence

2021-04-14 Thread Felix Kuehling
From: Alex Sierra [why] As part of the SVM functionality, the eviction mechanism used for SVM_BOs is different. This mechanism uses one eviction fence per prange, instead of one fence per kfd_process. [how] A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between SVM_BO or regula

[PATCH 23/34] drm/amdkfd: page table restore through svm API

2021-04-14 Thread Felix Kuehling
Page table restore implementation in SVM API. This is called from the fault handler at amdgpu_vm. To update page tables through the page fault retry IH. Signed-off-by: Alex Sierra Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdk

[PATCH 19/34] drm/amdkfd: HMM migrate ram to vram

2021-04-14 Thread Felix Kuehling
Register svm range with same address and size but perferred_location is changed from CPU to GPU or from GPU to CPU, trigger migration the svm range from ram to vram or from vram to ram. If svm range prefetch location is GPU with flags KFD_IOCTL_SVM_FLAG_HOST_ACCESS, validate the svm range on ram f

[PATCH 29/34] drm/amdgpu: add svm_bo eviction to enable_signal cb

2021-04-14 Thread Felix Kuehling
From: Alex Sierra Add to amdgpu_amdkfd_fence.enable_signal callback, support for svm_bo fence eviction. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 11 --- 1 file changed, 8 insertions(+),

[PATCH 27/34] drm/amdkfd: add svm_bo eviction mechanism support

2021-04-14 Thread Felix Kuehling
svm_bo eviction mechanism is different from regular BOs. Every SVM_BO created contains one eviction fence and one worker item for eviction process. SVM_BOs can be attached to one or more pranges. For SVM_BO eviction mechanism, TTM will start to call enable_signal callback for every SVM_BO until VRA

[PATCH 09/34] drm/amdgpu: export vm update mapping interface

2021-04-14 Thread Felix Kuehling
From: Philip Yang It will be used by kfd to map svm range to GPU, because svm range does not have amdgpu_bo and bo_va, cannot use amdgpu_bo_update interface, use amdgpu vm update interface directly. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- driv

[PATCH 16/34] drm/amdkfd: validate vram svm range from TTM

2021-04-14 Thread Felix Kuehling
If svm range perfetch location is not zero, use TTM to alloc amdgpu_bo vram nodes to validate svm range, then map vram nodes to GPUs. Use offset to sub allocate from the same amdgpu_bo to handle overlap vram range while adding new range or unmapping range. svm_bo has ref count to trace the shared

[PATCH 28/34] drm/amdgpu: svm bo enable_signal call condition

2021-04-14 Thread Felix Kuehling
From: Alex Sierra [why] To support svm bo eviction mechanism. [how] If the BO crated has AMDGPU_AMDKFD_CREATE_SVM_BO flag set, enable_signal callback will be called inside amdgpu_evict_flags. This also causes gutting of the BO by removing all placements, so that TTM won't actually do an eviction

[PATCH 26/34] drm/amdgpu: add param bit flag to create SVM BOs

2021-04-14 Thread Felix Kuehling
From: Alex Sierra Add CREATE_SVM_BO define bit for SVM BOs. Another define flag was moved to concentrate these KFD type flags in one include file. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 ++-

[PATCH 17/34] drm/amdkfd: support xgmi same hive mapping

2021-04-14 Thread Felix Kuehling
From: Philip Yang amdgpu_gmc_get_vm_pte use bo_va->is_xgmi same hive information to set pte flags to update GPU mapping. Add local structure variable bo_va, and update bo_va.is_xgmi, pass it to mapping->bo_va while mapping to GPU. Assuming xgmi pstate is hi after boot. Signed-off-by: Philip Yan

[PATCH 24/34] drm/amdkfd: SVM API call to restore page tables

2021-04-14 Thread Felix Kuehling
From: Alex Sierra Use SVM API to restore page tables when retry fault and compute context are enabled. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++- 1 file changed, 15 insertions(+),

[PATCH 10/34] drm/amdkfd: map svm range to GPUs

2021-04-14 Thread Felix Kuehling
Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap svm range system memory pages address to GPUs. Signed-off-by: Philip Yang Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 473 ++

[PATCH 21/34] drm/amdkfd: invalidate tables on page retry fault

2021-04-14 Thread Felix Kuehling
GPU page tables are invalidated by unmapping prange directly at the mmu notifier, when page fault retry is enabled through amdgpu_noretry global parameter. The restore page table is performed at the page fault handler. If xnack is on, we update GPU mappings after migration to avoid unnecessary GPU

[PATCH 22/34] drm/amdgpu: enable 48-bit IH timestamp counter

2021-04-14 Thread Felix Kuehling
From: Alex Sierra By default this timestamp is 32 bit counter. It gets overflowed in around 10 minutes. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu

[PATCH 20/34] drm/amdkfd: HMM migrate vram to ram

2021-04-14 Thread Felix Kuehling
If CPU page fault happens, HMM pgmap_ops callback migrate_to_ram start migrate memory from vram to ram in steps: 1. migrate_vma_pages get vram pages, and notify HMM to invalidate the pages, HMM interval notifier callback evict process queues 2. Allocate system memory pages 3. Use svm copy memory t

[PATCH 18/34] drm/amdkfd: copy memory through gart table

2021-04-14 Thread Felix Kuehling
From: Philip Yang Use sdma linear copy to migrate data between ram and vram. The sdma linear copy command uses kernel buffer function queue to access system memory through gart table. Use reserved gart table window 0 to map system page address, and vram page address is direct mapping. Use the sa

[PATCH 12/34] drm/amdgpu: Enable retry faults unconditionally on Aldebaran

2021-04-14 Thread Felix Kuehling
This is needed to allow per-process XNACK mode selection in the SQ when booting with XNACK off by default. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Tested-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 8 ++--

[PATCH 05/34] drm/amdgpu: add common HMM get pages function

2021-04-14 Thread Felix Kuehling
From: Philip Yang Move the HMM get pages function from amdgpu_ttm and to amdgpu_mn. This common function will be used by new svm APIs. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 83 +++

[PATCH 15/34] drm/amdkfd: register HMM device private zone

2021-04-14 Thread Felix Kuehling
From: Philip Yang Register vram memory as MEMORY_DEVICE_PRIVATE type resource, to allocate vram backing pages for page migration. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/Kconfig | 2 + drivers/gpu/drm/amd/amd

[PATCH 14/34] drm/amdkfd: add ioctl to configure and query xnack retries

2021-04-14 Thread Felix Kuehling
From: Alex Sierra Xnack retries are used for page fault recovery. Some AMD chip families support continuously retry while page table entries are invalid. The driver must handle the page fault interrupt and fill in a valid entry for the GPU to continue. This ioctl allows to enable/disable XNACK r

[PATCH 13/34] drm/amdkfd: add xnack enabled flag to kfd_process

2021-04-14 Thread Felix Kuehling
From: Alex Sierra XNACK mode controls the SQ RETRY_DISABLE setting that determines, whether recoverable page faults can be supported on GFXv9 hardware. Only on Aldebaran we can support different processes running with different XNACK modes. On older chips all processes must use the same RETRY_DIS

[PATCH 11/34] drm/amdkfd: svm range eviction and restore

2021-04-14 Thread Felix Kuehling
HMM interval notifier callback notify CPU page table will be updated, stop process queues if the updated address belongs to svm range registered in process svms objects tree. Scheduled restore work to update GPU page table using new pages address in the updated svm range. The restore worker flushe

[PATCH 08/34] drm/amdkfd: deregister svm range

2021-04-14 Thread Felix Kuehling
From: Philip Yang When application explicitly call unmap or unmap from mmput when application exit, driver will receive MMU_NOTIFY_UNMAP event to remove svm range from process svms object tree and list first, unmap from GPUs (in the following patch). Split the svm ranges to handle partial unmapp

[PATCH 07/34] drm/amdkfd: validate svm range system memory

2021-04-14 Thread Felix Kuehling
From: Philip Yang Use HMM to get system memory pages address, which will be used to map to GPUs or migrate to vram. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 116 ++- drivers/gpu/drm

[PATCH 03/34] drm/amdkfd: register svm range

2021-04-14 Thread Felix Kuehling
From: Philip Yang svm range structure stores the range start address, size, attributes, flags, prefetch location and gpu bitmap which indicates which GPU this range maps to. Same virtual address is shared by CPU and GPUs. Process has svm range list which uses both interval tree and list to store

[PATCH 04/34] drm/amdkfd: add svm ioctl GET_ATTR op

2021-04-14 Thread Felix Kuehling
From: Philip Yang Get the intersection of attributes over all memory in the given range Signed-off-by: Philip Yang Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 164 +++ 1 file changed,

[PATCH 00/34] Add HMM-based SVM memory manager to KFD v5

2021-04-14 Thread Felix Kuehling
Updates since v4: - Rebased on upstream. - Added SPDX license headers and updated copyright on added files - Disabled XNACK on GFXv10 and later GPUs that don't support shader preemption on fault - Updated PTE flags for Aldebaran This series and the corresponding ROCm Thunk and KFDTest changes ar

[PATCH 06/34] drm/amdkfd: support larger svm range allocation

2021-04-14 Thread Felix Kuehling
From: Philip Yang For larger range allocation, if hmm_range_fault return -EBUSY, set retry timeout based on 1 second for every 512MB, this is safe timeout value. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c |

[PATCH 02/34] drm/amdkfd: add svm ioctl API

2021-04-14 Thread Felix Kuehling
From: Philip Yang Add svm (shared virtual memory) ioctl data structure and API definition. The svm ioctl API is designed to be extensible in the future. All operations are provided by a single IOCTL to preserve ioctl number space. The arguments structure ends with a variable size array of attrib

[PATCH 01/34] drm/amdkfd: helper to convert gpu id and idx

2021-04-14 Thread Felix Kuehling
From: Alex Sierra svm range uses gpu bitmap to store which GPU svm range maps to. Application pass driver gpu id to specify GPU, the helper is needed to convert gpu id to gpu bitmap idx. Access through kfd_process_device pointers array from kfd_process. Signed-off-by: Alex Sierra Reviewed-by:

Re: [PATCH] drm/amdgpu: Copy MEC FW version to MEC2 if we skipped loading MEC2

2021-04-14 Thread Felix Kuehling
Am 2021-04-14 um 7:39 p.m. schrieb Joseph Greathouse: > If we skipped loading MEC2 firmware separately from MEC, then > MEC2 will be running the same firmware image. Copy the MEC version > and feature numbers into MEC2 version and feature numbers. This > is needed for things like GWS support, where

[PATCH] drm/amdgpu: Copy MEC FW version to MEC2 if we skipped loading MEC2

2021-04-14 Thread Joseph Greathouse
If we skipped loading MEC2 firmware separately from MEC, then MEC2 will be running the same firmware image. Copy the MEC version and feature numbers into MEC2 version and feature numbers. This is needed for things like GWS support, where we rely on knowing what version of firmware is running on MEC

[PATCH 1/2] drm/amd/display: Update modifier list for gfx10_3

2021-04-14 Thread Qingqing Zhuo
[Why] Current list only includes modifiers where DCC_MAX_COMPRESSED_BLOCK is set to AMD_FMT_MOD_DCC_BLOCK_128B, while AMD_FMT_MOD_DCC_BLOCK_64B is also supported and used by userspace. [How] Add AMD_FMT_MOD_DCC_BLOCK_64B to modifiers with DCC supported. Signed-off-by: Qingqing Zhuo --- .../gpu/

[PATCH 2/2] Revert "Revert "drm/amdgpu: Ensure that the modifier requested is supported by plane.""

2021-04-14 Thread Qingqing Zhuo
This reverts commit bc3e72b3c3f20ab1583a8464e64f1a68169a28c5. The regression caused by the original patch has been cleared, thus introduce back the change. Signed-off-by: Qingqing Zhuo --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c| 13 + .../gpu/drm/amd/display/amdgpu_dm/amdgp

Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-14 Thread Mikhail Gavrilov
On Wed, 14 Apr 2021 at 03:22, Leo Liu wrote: > > This is decode command line, are you seeing issue with encode or > decode? I was means that described above the kernel panic happens only when OBS record or stream video with VAAPI encoder. Grabbing and encoding video with ffmpeg (given command exa

Re: [PATCH 03/30] drm/tegra: Don't register DP AUX channels before connectors

2021-04-14 Thread Lyude Paul
On Wed, 2021-04-14 at 18:49 +0200, Thierry Reding wrote: > On Fri, Feb 19, 2021 at 04:52:59PM -0500, Lyude Paul wrote: > > As pointed out by the documentation for drm_dp_aux_register(), > > drm_dp_aux_init() should be used in situations where the AUX channel for a > > display driver can potentially

Re: [PATCH] drm/amdgpu: Add double-sscanf but invert

2021-04-14 Thread Luben Tuikov
Yeah, %i would do, and certainly more elegant, except for the signed input. Not sure if that's a problem, either, thus the two sscanf... Regards, Luben On 2021-04-14 11:58 a.m., Felix Kuehling wrote: > From the sscanf manpage: > >    i  Matches  an optionally signed integer; the next poi

Re: [PATCH 03/30] drm/tegra: Don't register DP AUX channels before connectors

2021-04-14 Thread Thierry Reding
On Fri, Feb 19, 2021 at 04:52:59PM -0500, Lyude Paul wrote: > As pointed out by the documentation for drm_dp_aux_register(), > drm_dp_aux_init() should be used in situations where the AUX channel for a > display driver can potentially be registered before it's respective DRM > driver. This is the c

Re: [PATCH] drm/amdgpu: Add double-sscanf but invert

2021-04-14 Thread Felix Kuehling
From the sscanf manpage:    i  Matches  an optionally signed integer; the next pointer must be   a pointer to int.  The integer is read in base 16 if it  begins   with  0x  or  0X, in base 8 if it begins with 0, and in base 10   otherwise.  Only characte

Re: [PATCH 2/4] drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu

2021-04-14 Thread Felix Kuehling
Am 2021-04-14 um 11:21 a.m. schrieb philip yang: > > > On 2021-04-07 7:12 p.m., Felix Kuehling wrote: >> amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap >> to access the BO through the corresponding file descriptor. >> >> Signed-off-by: Felix Kuehling >> --- >> drivers/gp

Re: [PATCH 3/4] drm/amdkfd: Allow access for mmapping KFD BOs

2021-04-14 Thread Felix Kuehling
Am 2021-04-14 um 11:37 a.m. schrieb philip yang: > > > On 2021-04-07 7:12 p.m., Felix Kuehling wrote: >> DRM allows access automatically when it creates a GEM handle for a BO. >> KFD BOs don't have GEM handles, so KFD needs to manage access manually. > > After reading drm vma manager, I understand

Re: [PATCH 4/4] drm/amdgpu: Remove verify_access shortcut for KFD BOs

2021-04-14 Thread philip yang
On 2021-04-07 7:12 p.m., Felix Kuehling wrote: This shortcut is no longer needed with access managed progerly by KFD. Reviewed-by: Philip Yang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 --- 1 fi

Re: [PATCH 3/4] drm/amdkfd: Allow access for mmapping KFD BOs

2021-04-14 Thread philip yang
On 2021-04-07 7:12 p.m., Felix Kuehling wrote: DRM allows access automatically when it creates a GEM handle for a BO. KFD BOs don't have GEM handles, so KFD needs to manage access manually. After reading drm vma manager, I understand it uses rbtree

[PATCH] drm/amdgpu: Add double-sscanf but invert

2021-04-14 Thread Luben Tuikov
Add back the double-sscanf so that both decimal and hexadecimal values could be read in, but this time invert the scan so that hexadecimal format with a leading 0x is tried first, and if that fails, then try decimal format. Also use a logical-AND instead of nesting double if-conditional. See comm

[PATCH] drm/amdgpu: Add double-sscanf but invert

2021-04-14 Thread Luben Tuikov
Add back the double-sscanf so that both decimal and hexadecimal values could be read in, but this time invert the scan so that hexadecimal format with a leading 0x is tried first, and if that fails, then try decimal format. Also use a logical-AND instead of nesting double if-conditional. See comm

Re: [PATCH 2/4] drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu

2021-04-14 Thread philip yang
On 2021-04-07 7:12 p.m., Felix Kuehling wrote: amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap to access the BO through the corresponding file descriptor. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h

Re: [PATCH 1/4] drm/amdkfd: Remove legacy code not acquiring VMs

2021-04-14 Thread philip yang
On 2021-04-07 7:12 p.m., Felix Kuehling wrote: ROCm user mode has acquired VMs from DRM file descriptors for as long as it supported the upstream KFD. Legacy code to support older versions of ROCm is not needed any more. Reviewed-by: Philip Yang

Re: [PATCH 9/9] drm/amdgpu: Lock the attached dmabuf in unpopulate

2021-04-14 Thread Felix Kuehling
Am 2021-04-14 um 3:33 a.m. schrieb Christian König: > Am 14.04.21 um 08:46 schrieb Felix Kuehling: >> amdgpu_ttm_tt_unpopulate can be called during bo_destroy. The >> dmabuf->resv >> must not be held by the caller or dma_buf_detach will deadlock. This is >> probably not the right fix. I get a recur

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-14 Thread Christian König
Am 14.04.21 um 16:36 schrieb Andrey Grodzovsky:  [SNIP] We are racing here once more and need to handle that. But why, I wrote above that we first stop the all schedulers, then only call drm_sched_entity_kill_jobs. The schedulers consuming jobs is not the problem, we already handle that

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Felix Kuehling
Am 2021-04-14 um 8:25 a.m. schrieb Daniel Vetter: >> Sorry I though that this would be obvious :) >> >> I've already pushed the patch in the morning, but going to keep that in >> mind for the next time. > I'll keep reminding you to pls elaborate more in commit messages, it's > coming up every once

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-14 Thread Andrey Grodzovsky
On 2021-04-14 3:01 a.m., Christian König wrote: Am 13.04.21 um 20:30 schrieb Andrey Grodzovsky: On 2021-04-13 2:25 p.m., Christian König wrote: Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky: On 2021-04-13 2:03 p.m., Christian König wrote: Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:

Re: [PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface

2021-04-14 Thread Christian König
Am 14.04.21 um 15:59 schrieb Roy Sun: Tracking devices, process info and fence info using /proc/pid/fdinfo Signed-off-by: David M Nieto Signed-off-by: Roy Sun --- drivers/gpu/drm/amd/amdgpu/Makefile| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgp

Re: [PATCH] drm/amdgpu: Fix checking return result of retire page

2021-04-14 Thread Luben Tuikov
You also seem to be missing a leading parenthesis. Regards, Luben On 2021-04-14 9:58 a.m., Luben Tuikov wrote: > I'll take a look. > > For the time being, you don't need parenthesis around != conditional as && > has lower > priority, i.e. the parenthesis are superfluous. > > Regards, > Luben >

[PATCH 2/2] drm/amdgpu: Add show_fdinfo() interface

2021-04-14 Thread Roy Sun
Tracking devices, process info and fence info using /proc/pid/fdinfo Signed-off-by: David M Nieto Signed-off-by: Roy Sun --- drivers/gpu/drm/amd/amdgpu/Makefile| 2 + drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 61 +++ dri

[PATCH 1/2] drm/scheduler: Change scheduled fence track

2021-04-14 Thread Roy Sun
Update the timestamp of scheduled fence on HW completion of the previous fences This allow more accurate tracking of the fence execution in HW Signed-off-by: David M Nieto Signed-off-by: Roy Sun --- drivers/gpu/drm/scheduler/sched_main.c | 11 +-- 1 file changed, 9 insertions(+), 2 del

Re: [PATCH] drm/amdgpu: Fix checking return result of retire page

2021-04-14 Thread Luben Tuikov
I'll take a look. For the time being, you don't need parenthesis around != conditional as && has lower priority, i.e. the parenthesis are superfluous. Regards, Luben On 2021-04-14 4:19 a.m., Clements, John wrote: > [AMD Official Use Only - Internal Distribution Only] > > Thank you Luben for re

Re: [PATCH v2] drm/amd/display: Fix two cursor duplication when using overlay

2021-04-14 Thread Harry Wentland
On 2021-04-13 8:06 p.m., Rodrigo Siqueira wrote: Our driver supports overlay planes, and as expected, some userspace compositor takes advantage of these features. If the userspace is not enabling the cursor, they can use multiple planes as they please. Nevertheless, we start to have constraint

Re: [PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-14 Thread Alex Deucher
On Wed, Apr 14, 2021 at 5:09 AM Jiawei Gu wrote: > > Add AMDGPU_INFO_VBIOS_INFO subquery id for detailed vbios info. Please provide a link to patches for an open source tool which uses this new query. > > Signed-off-by: Jiawei Gu > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 19 +++ >

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Christian König
Am 14.04.21 um 14:47 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 2:43 PM Christian König wrote: Am 14.04.21 um 14:25 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 12:49 PM Christian König wrote: Am 14.04.21 um 12:26 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 11:19:41AM +0200, Christ

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Daniel Vetter
On Wed, Apr 14, 2021 at 2:43 PM Christian König wrote: > > Am 14.04.21 um 14:25 schrieb Daniel Vetter: > > On Wed, Apr 14, 2021 at 12:49 PM Christian König > > wrote: > >> Am 14.04.21 um 12:26 schrieb Daniel Vetter: > >>> On Wed, Apr 14, 2021 at 11:19:41AM +0200, Christian König wrote: > Am

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Christian König
Am 14.04.21 um 14:25 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 12:49 PM Christian König wrote: Am 14.04.21 um 12:26 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 11:19:41AM +0200, Christian König wrote: Am 14.04.21 um 11:15 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 08:51:51AM +020

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Daniel Vetter
On Wed, Apr 14, 2021 at 12:49 PM Christian König wrote: > > Am 14.04.21 um 12:26 schrieb Daniel Vetter: > > On Wed, Apr 14, 2021 at 11:19:41AM +0200, Christian König wrote: > >> Am 14.04.21 um 11:15 schrieb Daniel Vetter: > >>> On Wed, Apr 14, 2021 at 08:51:51AM +0200, Christian König wrote: > >>>

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Christian König
Am 14.04.21 um 12:26 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 11:19:41AM +0200, Christian König wrote: Am 14.04.21 um 11:15 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 08:51:51AM +0200, Christian König wrote: Am 14.04.21 um 08:48 schrieb Felix Kuehling: Pages in SG BOs were not alloc

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Daniel Vetter
On Wed, Apr 14, 2021 at 11:19:41AM +0200, Christian König wrote: > Am 14.04.21 um 11:15 schrieb Daniel Vetter: > > On Wed, Apr 14, 2021 at 08:51:51AM +0200, Christian König wrote: > > > Am 14.04.21 um 08:48 schrieb Felix Kuehling: > > > > Pages in SG BOs were not allocated by TTM. So don't count th

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Christian König
Am 14.04.21 um 11:15 schrieb Daniel Vetter: On Wed, Apr 14, 2021 at 08:51:51AM +0200, Christian König wrote: Am 14.04.21 um 08:48 schrieb Felix Kuehling: Pages in SG BOs were not allocated by TTM. So don't count them against TTM's pages limit. Signed-off-by: Felix Kuehling Reviewed-by: Chris

Recall: [PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Yang, Stanley
Yang, Stanley would like to recall the message, "[PATCH] drm/amd/sriov no need to config GECC for sriov". ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Yang, Stanley
[AMD Official Use Only - Internal Distribution Only] Stanley.Yang Regards, Stanley > -Original Message- > From: Jack Zhang > Sent: Wednesday, April 14, 2021 5:04 PM > To: amd-gfx@lists.freedesktop.org; Yang, Stanley ; > Clements, John ; Zhang, Hawking > > Cc: Zhang, Jack (Jian) > Subj

Recall: [PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Yang, Stanley
Yang, Stanley would like to recall the message, "[PATCH] drm/amd/sriov no need to config GECC for sriov". ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Recall: [PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Yang, Stanley
Yang, Stanley would like to recall the message, "[PATCH] drm/amd/sriov no need to config GECC for sriov". ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 8/9] drm/ttm: Don't count pages in SG BOs against pages_limit

2021-04-14 Thread Daniel Vetter
On Wed, Apr 14, 2021 at 08:51:51AM +0200, Christian König wrote: > Am 14.04.21 um 08:48 schrieb Felix Kuehling: > > Pages in SG BOs were not allocated by TTM. So don't count them against > > TTM's pages limit. > > > > Signed-off-by: Felix Kuehling > > Reviewed-by: Christian König > > Going to

RE: [PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Yang, Stanley
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Stanley > -Original Message- > From: Jack Zhang > Sent: Wednesday, April 14, 2021 5:04 PM > To: amd-gfx@lists.freedesktop.org; Yang, Stanley ; > Clements, John ; Zhang, Hawking > > Cc: Zhang, Jack

RE: [PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Jack Zhang Sent: Wednesday, April 14, 2021 17:04 To: amd-gfx@lists.freedesktop.org; Yang, Stanley ; Clements, John ; Zhang, Hawking Cc: Zhang, Jack (Jian) Subjec

[PATCH] drm/amdgpu: Add vbios info ioctl interface

2021-04-14 Thread Jiawei Gu
Add AMDGPU_INFO_VBIOS_INFO subquery id for detailed vbios info. Signed-off-by: Jiawei Gu --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 19 +++ drivers/gpu/drm/amd/amdgpu/atom.c | 158 + drivers/gpu/drm/amd/amdgpu/atom.h | 11 ++ drivers/gpu/drm/amd/incl

[PATCH] drm/amd/sriov no need to config GECC for sriov

2021-04-14 Thread Jack Zhang
No need to config GECC feature here for sriov Leave the host drvier to do the configuration job. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/am

Re: [PATCH 3/8] drm/amdgpu: Implement mmap as GEM object function

2021-04-14 Thread Felix Kuehling
Am 2021-04-14 um 3:44 a.m. schrieb Thomas Zimmermann: > Hi > > Am 07.04.21 um 21:49 schrieb Felix Kuehling: >> On 2021-04-07 3:34 p.m., Felix Kuehling wrote: >>> On 2021-04-07 7:25 a.m., Christian König wrote: +    /* + * Don't verify access for KFD BOs. They > don't have a G

RE: [PATCH] drm/amdgpu: Fix checking return result of retire page

2021-04-14 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only] Thank you Luben for re-organizing this source file and fixing those bugs. Please add back support for decimal input parameter values, maybe something like this: + if (sscanf(str, "%*s 0x%llx", &address) != 1) && (sscanf

Re: [PATCH 1/2] drm/amdgpu: use pre-calculated bo size

2021-04-14 Thread Nirmoy
On 4/14/21 8:52 AM, Christian König wrote: Am 13.04.21 um 22:57 schrieb Nirmoy Das: Use bo->tbo.base.size instead of calculating it from num_pages. Those don't clash with the two I've send out yesterday, don't they? It does, I didn't check dri-devel's [PATCH 1/7] drm/nouveau: use bo->base

Re: [PATCH 3/8] drm/amdgpu: Implement mmap as GEM object function

2021-04-14 Thread Thomas Zimmermann
Hi Am 07.04.21 um 21:49 schrieb Felix Kuehling: On 2021-04-07 3:34 p.m., Felix Kuehling wrote: On 2021-04-07 7:25 a.m., Christian König wrote: +    /* + * Don't verify access for KFD BOs. They don't have a GEM + * object associated with them. + */ +    if (bo->kfd_bo) +    g

[PATCH] drm/amd/pm: revise two names of sensor values for vangogh

2021-04-14 Thread Xiaojian Du
This patch is to revise two names of sensor values for vangogh. New smu metrics table is supported by new pmfw (from version 4.63.36.00 ), it includes two parts, one part is the current smu metrics table data and the other part is the average smu metrics table data. The hwmon will read the current

Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-04-14 Thread Christian König
Am 13.04.21 um 20:30 schrieb Andrey Grodzovsky: On 2021-04-13 2:25 p.m., Christian König wrote: Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky: On 2021-04-13 2:03 p.m., Christian König wrote: Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky: On 2021-04-13 3:10 a.m., Christian König wrote: