Re: [PATCH 2/2] drm/ttm: Double check mem_type of BO while eviction

2021-11-09 Thread Christian König
Am 10.11.21 um 05:31 schrieb xinhui pan: BO might sit in a wrong lru list as there is a small period of memory moving and lru list updating. Lets skip eviction if we hit such mismatch. Suggested-by: Christian König Signed-off-by: xinhui pan Reviewed-by: Christian König for the series.

[RFC 1/2] ACPI: platform_profile: Add support for notification chains

2021-11-09 Thread Mario Limonciello
Allow other drivers to initialize relative to current active profile and react to platform profile changes. Drivers wishing to utilize this should register for notification at module load and unregister when unloading. Notifications will come in the from a notifier call. Signed-off-by: Mario

[RFC 2/2] drm/amd/pm: Add support for reacting to platform profile notification

2021-11-09 Thread Mario Limonciello
Various drivers provide platform profile support to let users set a hint in their GUI whether they want to run in a high performance, low battery life or balanced configuration. Drivers that provide this typically work with the firmware on their system to configure hardware. In the case of

[RFC 0/2] Let amdgpu react to platform profile changes

2021-11-09 Thread Mario Limonciello
Many OEM platform provide a platform profile knob that can be used to make firmware tunings to the system to allow operating in a higher or lower performance mode trading off power consumption. Software like power-profiles-daemon to expose this knob to the UI. As we know the user's intent to go

Re: [PATCH] drm/amdgpu: add missed support for UVD IP_VERSION(3, 0, 64)

2021-11-09 Thread Alex Deucher
Reviewed-by: Alex Deucher On Wed, Nov 10, 2021 at 12:16 AM Guchun Chen wrote: > > Fixes: 5b30f206dbd1("drm/amdgpu/amdgpu_vcn: convert to IP version checking") > Signed-off-by: Flora Cui > Signed-off-by: Guchun Chen > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 + >

[PATCH] drm/amdgpu: add missed support for UVD IP_VERSION(3, 0, 64)

2021-11-09 Thread Guchun Chen
Fixes: 5b30f206dbd1("drm/amdgpu/amdgpu_vcn: convert to IP version checking") Signed-off-by: Flora Cui Signed-off-by: Guchun Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 1 + drivers/gpu/drm/amd/amdgpu/nv.c | 1 + 3

Re: [PATCH 5/5] drm/amdkfd: svm deferred work pin mm

2021-11-09 Thread Felix Kuehling
On 2021-11-09 6:04 p.m., Philip Yang wrote: Make sure mm does not remove when prange deferred work insert mmu range notifier, to avoid WARNING: WARNING: CPU: 6 PID: 1787 at mm/mmu_notifier.c:932 __mmu_interval_notifier_insert+0xdd/0xf0 Workqueue: events svm_range_deferred_list_work [amdgpu]

[PATCH 1/2] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread xinhui pan
After we move BO to a new memory region, we should put it to the new memory manager's lru list regardless we unlock the resv or not. Cc: sta...@vger.kernel.org Reviewed-by: Christian König Signed-off-by: xinhui pan --- drivers/gpu/drm/ttm/ttm_bo.c | 2 ++ 1 file changed, 2 insertions(+) diff

[PATCH 2/2] drm/ttm: Double check mem_type of BO while eviction

2021-11-09 Thread xinhui pan
BO might sit in a wrong lru list as there is a small period of memory moving and lru list updating. Lets skip eviction if we hit such mismatch. Suggested-by: Christian König Signed-off-by: xinhui pan --- drivers/gpu/drm/ttm/ttm_bo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

Re: [PATCH 4/5] drm/amdkfd: restore pages race with process termination

2021-11-09 Thread Felix Kuehling
On 2021-11-09 6:04 p.m., Philip Yang wrote: restore pages work can not find kfd process or mm struct if process is destroyed before drain retry fault work schedule to run, this is not failure, return 0 to avoid dump GPU vm fault kernel log. I wonder if this could also be solved by draining page

Re: [PATCH 3/5] drm/amdkfd: restore pages race with vma remove

2021-11-09 Thread Felix Kuehling
On 2021-11-09 6:04 p.m., Philip Yang wrote: Before restore pages takes mmap read or write lock, vma maybe removed. Check if vma exists before creating unregistered range or verifying range access permission, and return 0 if vma is removed to avoid restore pages return failure to report GPU vm

Re: [PATCH 2/5] drm/amdkfd: check child range to drain retry fault

2021-11-09 Thread Felix Kuehling
On 2021-11-09 6:04 p.m., Philip Yang wrote: If unmapping partial range, the parent prange list op is update notifier, child range list op is unmap range, need check child range to set drain retry fault flag. Signed-off-by: Philip Yang I think this could be simplified by simply setting

Re: [PATCH] drm/amdgpu: Pin MMIO/DOORBELL BO's in GTT domain

2021-11-09 Thread Felix Kuehling
On 2021-11-09 2:12 p.m., Ramesh Errabolu wrote: MMIO/DOORBELL BOs encode control data and should be pinned in GTT domain before enabling PCIe connected peer devices in accessing it Signed-off-by: Ramesh Errabolu Reviewed-by: Felix Kuehling ---

Re: [PATCH] drm/amdgpu: drop jpeg IP initialization in SRIOV case

2021-11-09 Thread Alex Deucher
On Tue, Nov 9, 2021 at 9:14 PM Guchun Chen wrote: > > Fixes: 67a765c6352d("drm/amdgpu: clean up set IP function") > > Signed-off-by: Guchun Chen Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff

[PATCH v2 1/1] drm/amdgpu: Fix MMIO HDP flush on SRIOV

2021-11-09 Thread Felix Kuehling
Disable HDP register remapping on SRIOV and set rmmio_remap.reg_offset to the fixed address of the VF register for hdp_v*_flush_hdp. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c | 4 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 4

[PATCH] drm/amdgpu: drop jpeg IP initialization in SRIOV case

2021-11-09 Thread Guchun Chen
Fixes: 67a765c6352d("drm/amdgpu: clean up set IP function") Signed-off-by: Guchun Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c

[PATCH 1/5] drm/amdgpu: handle IH ring1 overflow

2021-11-09 Thread Philip Yang
IH ring1 is used to process GPU retry fault, overflow is enabled to drain retry fault before unmapping the range, wptr may pass rptr, amdgpu_ih_process should check rptr equals to the latest wptr to exit, otherwise it will continue to recover outdatad retry fault after drain retry fault is done,

[PATCH 4/5] drm/amdkfd: restore pages race with process termination

2021-11-09 Thread Philip Yang
restore pages work can not find kfd process or mm struct if process is destroyed before drain retry fault work schedule to run, this is not failure, return 0 to avoid dump GPU vm fault kernel log. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++-- 1 file changed, 2

[PATCH 5/5] drm/amdkfd: svm deferred work pin mm

2021-11-09 Thread Philip Yang
Make sure mm does not remove when prange deferred work insert mmu range notifier, to avoid WARNING: WARNING: CPU: 6 PID: 1787 at mm/mmu_notifier.c:932 __mmu_interval_notifier_insert+0xdd/0xf0 Workqueue: events svm_range_deferred_list_work [amdgpu] RIP:

[PATCH 3/5] drm/amdkfd: restore pages race with vma remove

2021-11-09 Thread Philip Yang
Before restore pages takes mmap read or write lock, vma maybe removed. Check if vma exists before creating unregistered range or verifying range access permission, and return 0 if vma is removed to avoid restore pages return failure to report GPU vm fault to application. Signed-off-by: Philip

[PATCH 2/5] drm/amdkfd: check child range to drain retry fault

2021-11-09 Thread Philip Yang
If unmapping partial range, the parent prange list op is update notifier, child range list op is unmap range, need check child range to set drain retry fault flag. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 - 1 file changed, 12 insertions(+), 1

[PATCH v2 3/3] drm/amdkfd: convert misc checks to IP version checking

2021-11-09 Thread Graham Sider
Switch to IP version checking instead of asic_type on various KFD version checks. Signed-off-by: Graham Sider --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 27 ++-

[PATCH v2 2/3] drm/amdkfd: convert switches to IP version checking

2021-11-09 Thread Graham Sider
Converts KFD switch statements to use IP version checking instead of asic_type. Signed-off-by: Graham Sider --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 124 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 8 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 33

[PATCH v2 1/3] drm/amdkfd: convert KFD_IS_SOC to IP version checking

2021-11-09 Thread Graham Sider
Defined as GC HWIP >= IP_VERSION(9, 0, 1). Also defines KFD_GC_VERSION to return GC HWIP version. Signed-off-by: Graham Sider --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 4 ++-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h

RE: [PATCH] drm/amdgpu: Pin MMIO/DOORBELL BO's in GTT domain

2021-11-09 Thread Errabolu, Ramesh
[AMD Official Use Only] Based on my experiments I am able conclude that I can avoid validating the BO prior to pinning it. I don't have the code history that led me to validating the BO in the first place. In any case I posted an updated patch to the DRM-NEXT branch in addition to a standalone

[PATCH] drm/amdgpu: Pin MMIO/DOORBELL BO's in GTT domain

2021-11-09 Thread Ramesh Errabolu
MMIO/DOORBELL BOs encode control data and should be pinned in GTT domain before enabling PCIe connected peer devices in accessing it Signed-off-by: Ramesh Errabolu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 70 +++ 1 file changed, 70 insertions(+) diff --git

Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

2021-11-09 Thread Rob Clark
On Tue, Nov 9, 2021 at 1:07 AM Daniel Vetter wrote: > > On Mon, Nov 08, 2021 at 03:39:17PM -0800, Rob Clark wrote: > > I stumbled across this thread when I ran into the same issue, while > > working out how to move drm/msm to use scheduler's retire + > > timeout/recovery (and get rid of our own

Re: 回复: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Christian König
In general the correct idea, but the wrong place to check that. Calling amdgpu_ttm_bo_eviction_valuable() is only optional, but that check must be mandatory for correct operation. This needs to be inside ttm_bo_evict_swapout_allowable(). Christian. Am 09.11.21 um 14:41 schrieb Pan, Xinhui:

Re: [PATCH 5/8] drm: Implement method to free unused pages

2021-11-09 Thread Arunpravin
On 04/11/21 12:46 am, Matthew Auld wrote: > On 25/10/2021 14:00, Arunpravin wrote: >> On contiguous allocation, we round up the size >> to the *next* power of 2, implement a function >> to free the unused pages after the newly allocate block. >> >> Signed-off-by: Arunpravin > > Ideally this

Re: [PATCH] drm/amd/display: log amdgpu_dm_atomic_check() failure cause

2021-11-09 Thread Harry Wentland
On 2021-11-09 00:14, Shirish S wrote: > update developers with next level of info about unsupported > display configuration query that led to atomic check failure. > > Signed-off-by: Shirish S Reviewed-by: Harry Wentland Harry > --- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 69

回复: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
[AMD Official Use Only] yes, a double check is needed. how about change below. As long as we detect such mismatch, it indicates another eviction is on going. return false here is reasonable. @@ -1335,6 +1336,8 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,

Re: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Christian König
Exactly that's the reason why we should have the double check in TTM I've mentioned in the other mail. Christian. Am 09.11.21 um 14:16 schrieb Pan, Xinhui: [AMD Official Use Only] Actually this patch does not totally fix the mismatch of lru list with mem_type as mem_type is changed in

Re: 回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Christian König
Yeah, but that should never happen in the first place. Even when the BO is on the wrong LRU TTM should check that beforehand. In other words when we pick a BO from the LRU we should still double check bo->resource->mem_type to make sure it is what we are searching for. Christian. Am

回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
[AMD Official Use Only] Actually this patch does not totally fix the mismatch of lru list with mem_type as mem_type is changed in ->move() and lru list is changed after that. During this small period, another eviction could still happed and evict this mismatched BO from sMam(say, its lru list

回复: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
[AMD Official Use Only] Yes, a stable tag is needed. vulkan guys say 5.14 hit this issue too. I think that amdgpu_bo_move() does support copy from sysMem to sysMem correctly. maybe something below is needed. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

Re: 回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Christian König
Mhm, I'm not sure what the rational behind that is. Not moving the BO would make things less efficient, but should never cause a crash. Maybe we should add a CC: stable tag and push it to -fixes instead? Christian. Am 09.11.21 um 13:28 schrieb Pan, Xinhui: [AMD Official Use Only] I hit

回复: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Pan, Xinhui
[AMD Official Use Only] I hit vulkan cts test hang with navi23. dmesg says gmc page fault with address 0x0, 0x1000, 0x2000 And some debug log also says amdgu copy one BO from system Domain to system Domain which is really weird. 发件人: Koenig,

Re: [PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread Christian König
Am 09.11.21 um 12:19 schrieb xinhui pan: After we move BO to a new memory region, we should put it to the new memory manager's lru list regardless we unlock the resv or not. Signed-off-by: xinhui pan Interesting find, did you trigger that somehow or did you just stumbled over it by reading

[PATCH] drm/ttm: Put BO in its memory manager's lru list

2021-11-09 Thread xinhui pan
After we move BO to a new memory region, we should put it to the new memory manager's lru list regardless we unlock the resv or not. Signed-off-by: xinhui pan --- drivers/gpu/drm/ttm/ttm_bo.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c

Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

2021-11-09 Thread Daniel Vetter
On Mon, Nov 08, 2021 at 03:39:17PM -0800, Rob Clark wrote: > I stumbled across this thread when I ran into the same issue, while > working out how to move drm/msm to use scheduler's retire + > timeout/recovery (and get rid of our own mirror list of in-flight > jobs). We already have hw error

RE: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting

2021-11-09 Thread Quan, Evan
[AMD Official Use Only] > -Original Message- > From: Lazar, Lijo > Sent: Tuesday, November 9, 2021 3:29 PM > To: Koenig, Christian ; Borislav Petkov > ; Paul Menzel ; Liu, Leo > > Cc: Deucher, Alexander ; Quan, Evan > ; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amd/pm:

RE: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting

2021-11-09 Thread Quan, Evan
[AMD Official Use Only] > -Original Message- > From: Lazar, Lijo > Sent: Tuesday, November 9, 2021 12:15 PM > To: Quan, Evan ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Borislav Petkov > > Subject: Re: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate > setting > >