Re: [PATCH 2/2] drm/amdgpu: check the GART table before invalidating TLB

2022-02-06 Thread Christian König
Am 07.02.22 um 07:30 schrieb Huang Rui: On Mon, Feb 07, 2022 at 10:41:55AM +0800, Liu, Aaron wrote: Bypass group programming (utcl2_harvest) aims to forbid UTCL2 to send invalidation command to harvested SE/SA. Once invalidation command comes into harvested SE/SA, SE/SA has no response and

[PATCH 2/2] drm/amdkfd: use unmap all queues for poison consumption

2022-02-06 Thread Tao Zhou
Replace reset queue for specific PASID with unmap all queues, reset queue could break CP scheduler. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c

[PATCH 1/2] drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid

2022-02-06 Thread Tao Zhou
As the function is used in more different cases, use a more general name. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 2 +-

Re: [PATCH] drm/amdgpu: move lockdep assert to the right place.

2022-02-06 Thread Christian König
Ah, yes of course! When the VM is freed we currently don't lock anything either because nobody should have a reference to that object any more. Going to fix this as well. Thanks, Christian. Am 04.02.22 um 20:15 schrieb Bhardwaj, Rajneesh: On 2/4/2022 1:50 PM, Christian König wrote: Am

Re: [PATCH 1/2] drm/amdgpu: add utcl2_harvest to gc 10.3.1

2022-02-06 Thread Huang Rui
On Mon, Feb 07, 2022 at 10:41:54AM +0800, Liu, Aaron wrote: > Confirmed with hardware team, there is harvesting for gc 10.3.1. > > Signed-off-by: Aaron Liu Reviewed-by: Huang Rui > --- > drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) >

[PATCH 8/8] fsdax: depend on ZONE_DEVICE || FS_DAX_LIMITED

2022-02-06 Thread Christoph Hellwig
Add a depends on ZONE_DEVICE support or the s390-specific limited DAX support, as one of the two is required at runtime for fsdax code to actually work. Signed-off-by: Christoph Hellwig --- fs/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/Kconfig b/fs/Kconfig index

[PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-06 Thread Christoph Hellwig
ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to

[PATCH 6/8] mm: don't include in

2022-02-06 Thread Christoph Hellwig
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Signed-off-by: Christoph Hellwig --- arch/arm64/mm/mmu.c| 1 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h |

[PATCH 5/8] mm: simplify freeing of devmap managed pages

2022-02-06 Thread Christoph Hellwig
Make put_devmap_managed_page return if it took charge of the page or not and remove the separate page_is_devmap_managed helper. Signed-off-by: Christoph Hellwig --- include/linux/mm.h | 34 ++ mm/memremap.c | 20 +--- mm/swap.c | 10

[PATCH 4/8] mm: move free_devmap_managed_page to memremap.c

2022-02-06 Thread Christoph Hellwig
free_devmap_managed_page has nothing to do with the code in swap.c, move it to live with the rest of the code for devmap handling. Signed-off-by: Christoph Hellwig --- include/linux/mm.h | 1 - mm/memremap.c | 21 + mm/swap.c | 23 --- 3

[PATCH 3/8] mm: remove pointless includes from

2022-02-06 Thread Christoph Hellwig
hmm.h pulls in the world for no good reason at all. Remove the includes and push a few ones into the users instead. Signed-off-by: Christoph Hellwig --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + include/linux/hmm.h | 9

[PATCH 2/8] mm: remove the __KERNEL__ guard from

2022-02-06 Thread Christoph Hellwig
__KERNEL__ ifdefs don't make sense outside of include/uapi/. Signed-off-by: Christoph Hellwig --- include/linux/mm.h | 4 1 file changed, 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b19223..7b46174989b086 100644 --- a/include/linux/mm.h +++

start sorting out the ZONE_DEVICE refcount mess

2022-02-06 Thread Christoph Hellwig
Hi all, this series removes the offset by one refcount for ZONE_DEVICE pages that are freed back to the driver owning them, which is just device private ones for now, but also the planned device coherent pages and the ehanced p2p ones pending. It does not address the fsdax pages yet, which will

[PATCH 1/8] mm: remove a pointless CONFIG_ZONE_DEVICE check in memremap_pages

2022-02-06 Thread Christoph Hellwig
memremap.c is only built when CONFIG_ZONE_DEVICE is set, so remove the superflous extra check. Signed-off-by: Christoph Hellwig --- mm/memremap.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/memremap.c b/mm/memremap.c index 6aa5f0c2d11fda..5f04a0709e436e 100644 ---

RE: [PATCH v1] drm/amdgpu: Print once if RAS unsupported

2022-02-06 Thread Zhou1, Tao
[AMD Official Use Only] Reviewed-by: Tao Zhou > -Original Message- > From: Tuikov, Luben > Sent: Friday, February 4, 2022 7:13 AM > To: amd-gfx@lists.freedesktop.org > Cc: Tuikov, Luben ; Deucher, Alexander > ; Zhang, Hawking ; > Clements, John ; Zhou1, Tao > ; Chai, Thomas > Subject:

RE: [PATCH] drm/amd/pm: add missing prototypes to amdgpu_dpm_internal

2022-02-06 Thread Quan, Evan
[AMD Official Use Only] Thanks for the fix! Reviewed-by: Evan Quan > -Original Message- > From: Maíra Canal > Sent: Thursday, February 3, 2022 8:40 AM > To: Quan, Evan ; Deucher, Alexander > ; Koenig, Christian > ; Pan, Xinhui ; > airl...@linux.ie; dan...@ffwll.ch; nat...@kernel.org; >

Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2022-02-06 Thread Grodzovsky, Andrey
I already did, thanks to Shayun I already tested on XGMI SRIOV and it looks ok. What I need now is code review, mostly on the new patches (8-12). I hope you, Monk, Shayun, Lijo and Christian can help with that. Andrey From: Chen, JingWen Sent: 06 February 2022

RE: [PATCH] drm/amd/pm: fix error handling

2022-02-06 Thread Quan, Evan
[AMD Official Use Only] Reviewed-by: Evan Quan > -Original Message- > From: t...@redhat.com > Sent: Saturday, February 5, 2022 11:00 PM > To: Quan, Evan ; Deucher, Alexander > ; Koenig, Christian > ; Pan, Xinhui ; > airl...@linux.ie; dan...@ffwll.ch; nat...@kernel.org; >

[PATCH 2/2] drm/amdgpu: check the GART table before invalidating TLB

2022-02-06 Thread Aaron Liu
Bypass group programming (utcl2_harvest) aims to forbid UTCL2 to send invalidation command to harvested SE/SA. Once invalidation command comes into harvested SE/SA, SE/SA has no response and system hang. This patch is to add checking if the GART table is already allocated before invalidating TLB.

[PATCH 1/2] drm/amdgpu: add utcl2_harvest to gc 10.3.1

2022-02-06 Thread Aaron Liu
Confirmed with hardware team, there is harvesting for gc 10.3.1. Signed-off-by: Aaron Liu --- drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c

Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2022-02-06 Thread JingWen Chen
Hi Andrey, I don't have any XGMI machines here, maybe you can reach out shaoyun for help. On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote: > Just a gentle ping. > > Andrey >

RE: [PATCH 1/2] drm/amdgpu: Fixed the defect of soft lock caused by infinite loop

2022-02-06 Thread Chai, Thomas
OK -Original Message- From: Kuehling, Felix Sent: Tuesday, February 1, 2022 12:24 AM To: Zhou1, Tao ; Chai, Thomas ; amd-gfx@lists.freedesktop.org Cc: Clements, John ; Zhang, Hawking Subject: Re: [PATCH 1/2] drm/amdgpu: Fixed the defect of soft lock caused by infinite loop Am

Re: [BUG] gpu: drm: radeon: two possible deadlocks involving locking and waiting

2022-02-06 Thread Christian König
Am 05.02.22 um 02:55 schrieb Jia-Ju Bai: Hi Christian, Thanks for the reply :) On 2022/2/1 15:56, Christian König wrote: Hi Jia-Ju, interesting that you have found those issues with an automated tool. And yes that is a well design flaw within the radeon driver which can happen on

Re: Minimal GPU setup

2022-02-06 Thread Christian König
Hi Amol, Am 05.02.22 um 10:47 schrieb Amol: Hello, I am learning to program Radeon HD 7350 by reading the radeon driver source in Linux, and the guides/manuals from AMD. I understand the general flow of initialization the driver performs. I have also been able to understand and re-implement