Re: [PATCH 1/2] drm/amdgpu: update df_v3_6 for xgmi perfmons

2019-06-19 Thread Kuehling, Felix
to submit your changes. Please apply the two fixes I pointed out below in a separate patch. Thanks,   Felix On 2019-06-20 0:15, Kuehling, Felix wrote: > On 2019-06-19 20:53, Kim, Jonathan wrote: >> v4: fixed kzalloc error check and modified df func init to return error code > This c

Re: [PATCH 1/2] drm/amdgpu: update df_v3_6 for xgmi perfmons

2019-06-19 Thread Kuehling, Felix
On 2019-06-19 20:53, Kim, Jonathan wrote: > v4: fixed kzalloc error check and modified df func init to return error code This comment isn't applicable any more because there is no more kzalloc. Maybe remove the review version history and instead update the description of what this commit

Re: [PATCH 1/2] drm/amdgpu: update df_v3_6 for xgmi perfmons

2019-06-19 Thread Kuehling, Felix
One more possible simplification below. On 2019-06-19 6:50 p.m., Kim, Jonathan wrote: > v4: fixed kzalloc error check and modified df func init to return error code > > v3: fixed cleanup by adding fini to free up adev df config counters > > v2: simplified by removing xgmi references in function

Re: [PATCH 2/2] drm/amdgpu: add pmu counters

2019-06-19 Thread Kuehling, Felix
Found three more problems. With those fixed, this patch is Reviewed-by: Felix Kuehling On 2019-06-19 6:50 p.m., Kim, Jonathan wrote: > adding perf event counters > > Change-Id: Ib8800c2d652e3e788872bd721458ac5422859b62 > Signed-off-by: Jonathan Kim > --- > drivers/gpu/drm/amd/amdgpu/Makefile

Re: [PATCH] drm/amdkfd: Add procfs-style information for KFD processes

2019-06-19 Thread Kuehling, Felix
Thanks. To add to what Kent said, this is for providing per-process information in rocm-smi or other tools. It should allow us to do something like "ps" or "top" for compute processes running on the GPU. Currently it's only the PASID. We'll add per-process memory usage and probably user mode

Re: [PATCH v3 hmm 11/12] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-18 Thread Kuehling, Felix
On 2019-06-18 1:37, Christoph Hellwig wrote: > On Mon, Jun 17, 2019 at 09:45:09PM -0300, Jason Gunthorpe wrote: >> Am I looking at the wrong thing? Looks like it calls it through a work >> queue should should be OK.. > Yes, it calls it through a work queue. I guess that is fine because > it needs

Re: [PATCH] drm/amdgpu: add pmu counters

2019-06-18 Thread Kuehling, Felix
On 2019-06-18 17:34, Kim, Jonathan wrote: > v2: fixed missing break in switch statement > > add pmu counters > > Change-Id: I1aca271fd12cabce0ccfc076f771cde2d4cadd54 > Signed-off-by: Jonathan Kim > --- > drivers/gpu/drm/amd/amdgpu/Makefile| 2 +- >

Re: [PATCH] drm/amdgpu: update df_v3_6 for xgmi perfmons

2019-06-18 Thread Kuehling, Felix
Sorry, I caught another problem with error handling. See below. On 2019-06-18 15:24, Kim, Jonathan wrote: > v3: fixed cleanup by adding fini to free up adev df config counters > > v2: simplified by removing xgmi references in function names and moving to > generic df function names. fixed issue

Re: [PATCH] drm/amdgpu: improve HMM error -ENOMEM and -EBUSY handling

2019-06-17 Thread Kuehling, Felix
On 2019-06-14 9:52 p.m., Yang, Philip wrote: > Under memory pressure, hmm_range_fault may return error code -ENOMEM > or -EBUSY, change pr_info to pr_debug to remove unnecessary kernel log > message because we will retry restore again. > > Call get_user_pages_done if TTM get user pages failed will

Re: [PATCH] drm/amd/amdgpu: cast mem->num_pages to 64-bits when shifting

2019-06-17 Thread Kuehling, Felix
On 2019-06-17 3:28 p.m., Christian König wrote: > Am 17.06.19 um 21:15 schrieb Kuehling, Felix: >> Looks good to me. One cosmetic comment inline. With that fixed this >> patch is Reviewed-by:  Felix Kuehling >> >> On 2019-06-14 12:51 p.m., StDenis, Tom wrote: >>

Re: [PATCH] drm/amd/amdgpu: cast mem->num_pages to 64-bits when shifting

2019-06-17 Thread Kuehling, Felix
Looks good to me. One cosmetic comment inline. With that fixed this patch is Reviewed-by:  Felix Kuehling On 2019-06-14 12:51 p.m., StDenis, Tom wrote: > On 32-bit hosts mem->num_pages is 32-bits and can overflow > when shifted. Add a cast to avoid this. > > Signed-off-by: Tom St Denis > ---

Re: [PATCH] drm/amdgpu: only use kernel zone if need_dma32 is not required

2019-06-13 Thread Kuehling, Felix
On 2019-06-13 8:59, Yang, Philip wrote: > On 2019-06-13 4:54 a.m., Koenig, Christian wrote: >> Am 12.06.19 um 23:13 schrieb Yang, Philip: >>> On 2019-06-12 3:28 p.m., Christian König wrote: Am 12.06.19 um 17:13 schrieb Yang, Philip: > TTM create two zones, kernel zone and dma32 zone for

Re: [PATCH] drm/amdgpu: Reserve space for shared fence

2019-06-12 Thread Kuehling, Felix
On 2019-06-11 9:20, Zeng, Oak wrote: > Call reservation_object_reserve_shared to reserve > space for shared fence. Otherwise it will trigger > BUG_ON condition in reservation_object_add_shared_fence. > > Change-Id: Ib0fae16247e33ee68f95bffa723451c0cd619344 > Signed-off-by: Oak Zeng Reviewed-by:

Re: [PATCH] drm/amd/amdgpu: Bail out of BO node creation if not enough VRAM (v3)

2019-06-12 Thread Kuehling, Felix
On 2019-06-12 4:08, StDenis, Tom wrote: > (v2): Return 0 and set mem->mm_node to NULL. > (v3): Use atomic64_add_return instead. > > Signed-off-by: Tom St Denis > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 - > 1 file changed, 12 insertions(+), 5 deletions(-) > >

Re: [PATCH v2 hmm 00/11] Various revisions from a locking/code review

2019-06-12 Thread Kuehling, Felix
[+Philip] Hi Jason, I'm out of the office this week. Hi Philip, can you give this a go? Not sure how much you've been following this patch series review. Message or call me on Skype to discuss any questions. Thanks,   Felix On 2019-06-11 12:48, Jason Gunthorpe wrote: > On Thu, Jun 06, 2019

Re: [PATCH] drm/amdgpu: only use kernel zone if need_dma32 is not required

2019-06-12 Thread Kuehling, Felix
TTM itself has some logic for need_dma32 and TTM_PAGE_FLAG_DMA32. I believe that should already handle this. need_dma32 is passed from amdgpu to ttm_bo_device_init to bdev->need_dma32. ttm_tt_create translates that to page_flags |= TTM_PAGE_FLAG_DMA32 and passes that to

Re: [PATCH 1/2] drm/amdkfd: Initialize HSA_CAP_ATS_PRESENT capability in topology codes

2019-06-10 Thread Kuehling, Felix
On 2019-06-10 13:48, Zeng, Oak wrote: > Move HSA_CAP_ATS_PRESENT initialization logic from kfd iommu codes to > kfd topology codes. This removes kfd_iommu_device_init's dependency > on kfd_topology_add_device. Also remove duplicate code setting the > same. > > Change-Id:

Re: [PATCH] drm/amdkfd: Initialize dqm earlier

2019-06-06 Thread Kuehling, Felix
On 2019-06-06 5:51 p.m., Zeng, Oak wrote: > dqm is referenced in function kfd_toplogy_add_device. > Move dqm initialization up to avoid NULL pointer reference. This addresses a pretty unlikely race condition where someone looks at /sys/kernel/debug/kfd/hqds during the device initialization. We

Re: [PATCH 4/6] drm/amdkfd: Separate mqd allocation and initialization

2019-06-06 Thread Kuehling, Felix
On 2019-06-06 5:40 p.m., Zeng, Oak wrote: > Introduce a new mqd allocation interface and split the original > init_mqd function into two functions: allocate_mqd and init_mqd. > Also renamed uninit_mqd to free_mqd. This is preparation work to > fix a circular lock dependency. > > Change-Id:

Re: [PATCH 6/6] drm/amdkfd: Fix sdma queue allocate race condition

2019-06-06 Thread Kuehling, Felix
Patches 5 and 6 are Reviewed-by: Felix Kuehling On 2019-06-06 2:25 p.m., Zeng, Oak wrote: > SDMA queue allocation requires the dqm lock at it modify > the global dqm members. Move up the dqm_lock so sdma > queue allocation is enclosed in the critical section. Move > mqd allocation out of

Re: [PATCH 4/6] drm/amdkfd: Separate mqd allocation and initialization

2019-06-06 Thread Kuehling, Felix
On 2019-06-06 2:25 p.m., Zeng, Oak wrote: > Introduce a new mqd allocation interface and split the original > init_mqd function into two functions: allocate_mqd and init_mqd. > Also renamed uninit_mqd to free_mqd. This is preparation work to > fix a circular lock dependency. > > Change-Id:

Re: [PATCH 0/2] Two bug-fixes for HMM

2019-06-06 Thread Kuehling, Felix
[resent with correct address for Alex] On 2019-06-06 11:11 a.m., Jason Gunthorpe wrote: > On Fri, May 10, 2019 at 07:53:21PM +0000, Kuehling, Felix wrote: >> These problems were found in AMD-internal testing as we're working on >> adopting HMM. They are rebased against glisse/h

Re: [PATCH 0/2] Two bug-fixes for HMM

2019-06-06 Thread Kuehling, Felix
On 2019-06-06 11:11 a.m., Jason Gunthorpe wrote: > On Fri, May 10, 2019 at 07:53:21PM +0000, Kuehling, Felix wrote: >> These problems were found in AMD-internal testing as we're working on >> adopting HMM. They are rebased against glisse/hmm-5.2-v3. We'd like to get >> them

Re: [PATCH 6/6] drm/amdkfd: Fix sdma queue allocate race condition

2019-06-05 Thread Kuehling, Felix
I think the simpler way to fix this, is to restructure create_queue_cpsch similar to the nocpsch version where we allocate the MQD early and take the DQM lock right after that. That way you don't need locked and unlocked variants of allocate_sdma_queue and deallocate_sdma_queue. Regards,  

Re: [PATCH 5/6] drm/amdkfd: Fix a circular lock dependency

2019-06-05 Thread Kuehling, Felix
This patch looks good to me, but it'll probably change a little if you implement my suggestions for patch 4. Regards,   Felix On 2019-06-05 12:06 p.m., Zeng, Oak wrote: > The idea to break the circular lock dependency is to move allocate_mqd > out of dqm lock protection. See callstack #1

Re: [PATCH 3/6] drm/amdkfd: Refactor create_queue_nocpsch

2019-06-05 Thread Kuehling, Felix
Patches 1-3 are Reviewed-by: Felix Kuehling On 2019-06-05 12:06 p.m., Zeng, Oak wrote: > This is prepare work to fix a circular lock dependency. > No logic change > > Change-Id: I4e0ee918260e7780de972dd71f4ce787b4f6dde9 > Signed-off-by: Oak Zeng > --- >

Re: [PATCH 4/6] drm/amdkfd: Separate mqd allocation and initialization

2019-06-05 Thread Kuehling, Felix
Nice. Surprisingly this actually resulted in slightly fewer lines of code. Some suggestions: Let's also rename uninit_mqd to free_mqd, as it is the counterpart to allocate_mqd, not init_mqd. And unless I'm missing something, init_mqd and update_mqd cannot fail now that allocate_mqd has been

Re: [PATCH] drm/ttm: fix ttm_bo_unreserve

2019-06-05 Thread Kuehling, Felix
On 2019-06-05 9:56, Michel Dänzer wrote: > On 2019-06-05 1:24 p.m., Christian König wrote: >> Am 04.06.19 um 21:03 schrieb Zeng, Oak: >>> From: amd-gfx On Behalf Of >>> Kuehling, Felix >>> On 2019-06-04 11:23, Christian König wrote: [snip] >>> --

Re: [PATCH] drm/amdgpu: use new HMM APIs and helpers v4

2019-06-04 Thread Kuehling, Felix
On 2019-06-04 16:13, Yang, Philip wrote: > HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver > path. The old hmm APIs are deprecated and will be removed in future. > > Below are changes in driver: > > 1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which >

Re: [PATCH 0/3] KFD upstreaming

2019-06-04 Thread Kuehling, Felix
On 2019-06-04 16:15, Alex Deucher wrote: > On Tue, Jun 4, 2019 at 4:12 PM Kuehling, Felix wrote: >> There haven't been any objections to these changes. Does anyone want to >> add their Acked-by before I submit? > Series is: > Acked-by: Alex Deucher > > I wonder if the

Re: [PATCH 0/3] KFD upstreaming

2019-06-04 Thread Kuehling, Felix
There haven't been any objections to these changes. Does anyone want to add their Acked-by before I submit? Thanks,   Felix On 2019-05-28 18:38, Kuehling, Felix wrote: > New feature: queue priorities > > The eviction state logic change is preparation for some debugger support > w

Re: [PATCH] drm/ttm: fix ttm_bo_unreserve

2019-06-04 Thread Kuehling, Felix
On 2019-06-04 11:23, Christian König wrote: > Since we now keep BOs on the LRU we need to make sure > that they are removed when they are pinned. > > Signed-off-by: Christian König > --- > include/drm/ttm/ttm_bo_driver.h | 14 ++ > 1 file changed, 6 insertions(+), 8 deletions(-) >

Re: [PATCH] drm/amdgpu: use new HMM APIs and helpers v3

2019-06-03 Thread Kuehling, Felix
[+Jerome] On 2019-06-03 7:20 p.m., Yang, Philip wrote: > > On 2019-06-03 5:02 p.m., Kuehling, Felix wrote: >> On 2019-06-03 2:44 p.m., Yang, Philip wrote: >>> HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver >>> path. The old hmm APIs are

Re: [PATCH 4/5] drm/amdkfd: Fix a circular lock dependency

2019-06-03 Thread Kuehling, Felix
allocate_vmid, allocate_hqd and allocate_sdma_queue all work on data in the DQM structure. So it seems these need to be protected by the DQM lock. allocate_doorbell doesn't need the DQM lock because its data structures are in the process_device structure, which is protected by the process

Re: [PATCH 3/5] drm/amdkfd: Refactor create_queue_nocpsch

2019-06-03 Thread Kuehling, Felix
On 2019-06-03 1:51 p.m., Zeng, Oak wrote: > This is prepare work to fix a circular lock dependency. > No logic change > > Change-Id: I4e0ee918260e7780de972dd71f4ce787b4f6dde9 > Signed-off-by: Oak Zeng > --- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 171 > +++-- >

Re: [PATCH] drm/amdgpu: use new HMM APIs and helpers v3

2019-06-03 Thread Kuehling, Felix
On 2019-06-03 2:44 p.m., Yang, Philip wrote: > HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver > path. The old hmm APIs are deprecated and will be removed in future. > > Below are changes in driver: > > 1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which >

Re: [PATCH] drm/amdgpu: use new HMM APIs and helpers v2

2019-05-31 Thread Kuehling, Felix
On 2019-05-31 3:09 p.m., Yang, Philip wrote: > HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver > path. The old hmm APIs are deprecated and will be removed in future. > > Below are changes in driver: > > 1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which >

Re: [PATCH] drm/amdgpu: use new HMM APIs and helpers

2019-05-31 Thread Kuehling, Felix
On 2019-05-31 5:32 p.m., Yang, Philip wrote: > > On 2019-05-31 3:42 p.m., Kuehling, Felix wrote: >> On 2019-05-31 1:28 p.m., Yang, Philip wrote: >>> On 2019-05-30 6:36 p.m., Kuehling, Felix wrote: >>>>> >>>>> #if IS_ENABLED

Re: [PATCH 1/4] drm/amdkfd: Fix sdma queue allocate race condition

2019-05-31 Thread Kuehling, Felix
On 2019-05-31 5:31 p.m., Kuehling, Felix wrote: > On 2019-05-31 5:19 p.m., Zeng, Oak wrote: >> SDMA queue allocation requires the dqm lock as it modify >> the global dqm members. Introduce functions to allocate/deallocate >> in locked/unlocked circumst

Re: [PATCH 3/4] drm/amdkfd: Refactor create_queue_nocpsch

2019-05-31 Thread Kuehling, Felix
On 2019-05-31 5:19 p.m., Zeng, Oak wrote: > This is prepare work to fix a circular lock dependency. > No logic change > > Change-Id: I4e0ee918260e7780de972dd71f4ce787b4f6dde9 > Signed-off-by: Oak Zeng > --- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 172 > - >

Re: [PATCH 1/4] drm/amdkfd: Fix sdma queue allocate race condition

2019-05-31 Thread Kuehling, Felix
On 2019-05-31 5:19 p.m., Zeng, Oak wrote: > SDMA queue allocation requires the dqm lock as it modify > the global dqm members. Introduce functions to allocate/deallocate > in locked/unlocked circumstance. > > Change-Id: Id3084524c5f65d9629b12cf6b4862a7516945cb1 > Signed-off-by: Oak Zeng > --- >

Re: [PATCH 6/8] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-31 Thread Kuehling, Felix
On 2019-05-31 4:53 p.m., Dave Airlie wrote: > On Sat, 1 Jun 2019 at 06:04, Kuehling, Felix wrote: >> On 2019-05-30 11:13 p.m., Dave Airlie wrote: >>> On Sat, 25 May 2019 at 05:48, Kuehling, Felix >>> wrote: >>>> On 2019-05-23 6:41 p.m., Zeng, Oak wrote

Re: [PATCH 6/8] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-31 Thread Kuehling, Felix
On 2019-05-30 11:13 p.m., Dave Airlie wrote: > On Sat, 25 May 2019 at 05:48, Kuehling, Felix wrote: >> On 2019-05-23 6:41 p.m., Zeng, Oak wrote: >>> Add a new kfd ioctl to allocate queue GWS. Queue >>> GWS is released on queue destroy. >>> >>> Change-

Re: [PATCH] drm/amdgpu: use new HMM APIs and helpers

2019-05-31 Thread Kuehling, Felix
On 2019-05-31 1:28 p.m., Yang, Philip wrote: > > On 2019-05-30 6:36 p.m., Kuehling, Felix wrote: >>> >>> #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR) >>> - if (gtt->ranges && >>> - ttm->pages[0] == hmm_pfn_to_page(>ranges[0],

Re: [PATCH] drm/amdgpu: use new HMM APIs and helpers

2019-05-30 Thread Kuehling, Felix
This is a nice simplification. See a few comments inline. On 2019-05-30 10:41 a.m., Yang, Philip wrote: > HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver > path. The old hmm APIs are deprecated and will be removed in future. > > Below are changes in driver: > > 1. Change

Re: [PATCH 3/3] drm/amdkfd: remove duplicated PCIE atomics request

2019-05-29 Thread Kuehling, Felix
On 2019-05-29 2:32 a.m., Xiao, Jack wrote: > Since amdgpu has always requested PCIE atomics, kfd don't > need duplicated PCIE atomics enablement. Referring to amdgpu > request result is enough. > > Signed-off-by: Jack Xiao This patch is Reviewed-by: Felix Kuehling > --- >

Re: [PATCH][next] drm/amdkfd: fix null pointer dereference on dev

2019-05-29 Thread Kuehling, Felix
On 2019-05-29 11:07 a.m., Colin King wrote: > From: Colin Ian King > > The pointer dev is set to null yet it is being dereferenced when > checking dev->dqm->sched_policy. Fix this by performing the check > on dev->dqm->sched_policy after dev has been assigned and null > checked. Also remove the

Re: [PATCH v2 4/4] drm/amdkfd: Check against device cgroup

2019-05-29 Thread Kuehling, Felix
On 2019-05-28 3:02 p.m., Tejun Heo wrote: > Hello, > > On Fri, May 17, 2019 at 08:12:17PM +, Kuehling, Felix wrote: >> Patches 1,2,4 will be submitted through amd-staging-drm-next. Patch 3 >> goes through the cgroup tree. Patch 4 depends on patch 3. So submitting >>

[PATCH 1/2] mm/hmm.c: support automatic NUMA balancing

2019-05-28 Thread Kuehling, Felix
From: Philip Yang While the page is migrating by NUMA balancing, HMM failed to detect this condition and still return the old page. Application will use the new page migrated, but driver pass the old page physical address to GPU, this crash the application later. Use pte_protnone(pte) to

[PATCH 0/2] Two HMM patches from MMOTS

2019-05-28 Thread Kuehling, Felix
These are two important HMM bug fixes to fix the HMM-based userptr implementation. They are alread staged in MMOTS: https://www.ozlabs.org/~akpm/mmots/broken-out/ Kuehling, Felix (1): mm/hmm.c: only set FAULT_FLAG_ALLOW_RETRY for non-blocking Philip Yang (1): mm/hmm.c: support automatic NUMA

[PATCH 2/2] mm/hmm.c: only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-28 Thread Kuehling, Felix
From: "Kuehling, Felix" Don't set this flag by default in hmm_vma_do_fault. It is set conditionally just a few lines below. Setting it unconditionally can lead to handle_mm_fault doing a non-blocking fault, returning -EBUSY and unlocking mmap_sem unexpectedly. Link: http://lkml.ke

[PATCH 1/3] drm/amdkfd: Simplify eviction state logic

2019-05-28 Thread Kuehling, Felix
Always mark evicted queues with q->properties.is_evicted = true, even queues that are inactive for other reason. This simplifies maintaining the eviction state as it doesn't require updating is_evicted when other queue activation conditions change. On the other hand, we now need to check those

[PATCH 3/3] drm/amdkfd: Implement queue priority controls for gfx9

2019-05-28 Thread Kuehling, Felix
From: Jay Cornwall Ported from gfx8, no changes in register setup. Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git

[PATCH 0/3] KFD upstreaming

2019-05-28 Thread Kuehling, Felix
New feature: queue priorities The eviction state logic change is preparation for some debugger support we're working on but haven't settled on the final ABI yet. Felix Kuehling (1): drm/amdkfd: Simplify eviction state logic Jay Cornwall (1): drm/amdkfd: Implement queue priority controls for

[PATCH 2/3] drm/amdkfd: CP queue priority controls

2019-05-28 Thread Kuehling, Felix
From: ozeng Translate queue priority into pipe priority and write to MQDs. The priority values are used to perform queue and pipe arbitration. Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 20

Re: [PATCH] drm/amdkfd: Fix a potential circular lock

2019-05-28 Thread Kuehling, Felix
On 2019-05-28 2:28 p.m., Zeng, Oak wrote: > The idea to break the circular lock dependency is, unlock dqm > temporarily before calling init_mqd in call stack #1 (see below) > > [ 513.604034] == > [ 513.604205] WARNING: possible circular locking

Re: [PATCH] drm/amdgpu: enable PCIE atomics ops support

2019-05-27 Thread Kuehling, Felix
On 2019-05-27 7:51 a.m., Christian König wrote: > That idea sounds sane to me as well. > > By the way, do we somewhere signal to userspace if atomics are > supported or not? Yes. KFD topology (a flag in the iolink) provides that information to user mode. Regards,   Felix > > I mean would be

Re: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

2019-05-24 Thread Kuehling, Felix
is nothing easier to evict. ROCm applications like to use lots of memory. So it probably makes sense for us to stop removing our BOs from the LRU as well while we mass-validate our BOs in amdgpu_amdkfd_gpuvm_restore_process_bos. Regards,   Felix > > Christian. > > Am 22.05.19 um 21:43 sc

Re: [PATCH 6/8] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-24 Thread Kuehling, Felix
On 2019-05-23 6:41 p.m., Zeng, Oak wrote: > Add a new kfd ioctl to allocate queue GWS. Queue > GWS is released on queue destroy. > > Change-Id: I60153c26a577992ad873e4292e759e5c3d5bbd15 > Signed-off-by: Oak Zeng Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |

Re: [PATCH 5/8] drm/amdkfd: Add function to set queue gws

2019-05-24 Thread Kuehling, Felix
On 2019-05-23 6:41 p.m., Zeng, Oak wrote: > Add functions in process queue manager to > set/unset queue gws. Also set process's number > of gws used. Currently only one queue in > process can use and use all gws. > > Change-Id: I03e480c8692db3eabfc3a188cce8904d5d962ab7 > Signed-off-by: Oak Zeng >

Re: [PATCH] drm/amdgpu: add pmu counters

2019-05-24 Thread Kuehling, Felix
On 2019-05-24 3:12 p.m., Kim, Jonathan wrote: > add pmu counters to monitor amdgpu device performance > > Change-Id: I8449f4ea824c411ee24a5b783ac066189b9de08e > Signed-off-by: Jonathan Kim > --- > drivers/gpu/drm/amd/amdgpu/Makefile| 2 +- >

Re: [PATCH 8/8] drm/amdkfd: Use kfd fd to mmap mmio

2019-05-24 Thread Kuehling, Felix
Hi Oak, I'm not sure why this is part of the GWS patch series. It's unrelated to GWS. Anyway, see one comment inline. On 2019-05-23 6:41 p.m., Zeng, Oak wrote: > TTM doesn't support CPU mapping of sg type bo (under which > mmio bo is created). Switch mmaping of mmio page to kfd > device file. >

Re: [PATCH 6/7] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-23 Thread Kuehling, Felix
On 2019-05-23 2:41 p.m., Zeng, Oak wrote: > Add a new kfd ioctl to allocate queue GWS. Queue > GWS is released on queue destroy. > > Change-Id: I60153c26a577992ad873e4292e759e5c3d5bbd15 > Signed-off-by: Oak Zeng > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30

Re: [PATCH 5/7] drm/amdkfd: Add function to set queue gws

2019-05-23 Thread Kuehling, Felix
On 2019-05-23 2:41 p.m., Zeng, Oak wrote: > Add functions in process queue manager to > set queue gws. Also set process's number > of gws used. Currently only one queue in > process can use and use all gws. > > Change-Id: I03e480c8692db3eabfc3a188cce8904d5d962ab7 > Signed-off-by: Oak Zeng > --- >

Re: [PATCH 4/7] drm/amdgpu: Add function to add/remove gws to kfd process

2019-05-22 Thread Kuehling, Felix
On 2019-05-22 11:51 a.m., Zeng, Oak wrote: > GWS bo is shared between all kfd processes. Add function to add gws > to kfd process's bo list so gws can be evicted from and restored > for process. > > Change-Id: I75d74cfdadb7075ff8b2b68634e205deb73dc1ea > Signed-off-by: Oak Zeng Reviewed-by: Felix

Re: [PATCH 6/7] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-22 Thread Kuehling, Felix
On 2019-05-22 11:51 a.m., Zeng, Oak wrote: > Add a new kfd ioctl to allocate queue GWS. Queue > GWS is released on queue destroy. > > Change-Id: I60153c26a577992ad873e4292e759e5c3d5bbd15 > Signed-off-by: Oak Zeng > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 39 >

Re: [PATCH 5/7] drm/amdkfd: Add function to set queue gws

2019-05-22 Thread Kuehling, Felix
On 2019-05-22 11:51 a.m., Zeng, Oak wrote: > Add functions in process queue manager to > set/get queue gws. Also set process's number > of gws used. Currently only one queue in > process can use and use all gws. > > Change-Id: I03e480c8692db3eabfc3a188cce8904d5d962ab7 > Signed-off-by: Oak Zeng >

Re: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

2019-05-22 Thread Kuehling, Felix
Can you explain how this avoids OOM situations? When is it safe to leave a reserved BO on the LRU list? Could we do the same thing in amdgpu_amdkfd_gpuvm.c? And if we did, what would be the expected side effects or consequences? Thanks,   Felix On 2019-05-22 8:59 a.m., Christian König wrote:

Re: [PATCH 1/7] drm/amdkfd: Add gws number to kfd topology node properties

2019-05-21 Thread Kuehling, Felix
On 2019-05-15 4:37 p.m., Zeng, Oak wrote: > Add amdgpu_amdkfd interface to get num_gws and add num_gws > to /sys/class/kfd/kfd/topology/nodes/x/properties. Only report > num_gws if MEC FW support GWS barriers. Currently it is > determined by a environment variable which will be replaced It's not

Re: [PATCH 4/7] drm/amdkfd: Add function to set queue gws

2019-05-21 Thread Kuehling, Felix
On 2019-05-15 4:37 p.m., Zeng, Oak wrote: > Add functions in process queue manager to > set/get queue gws. Also set process's number > of gws used. Currently only one queue in > process can use and use all gws. > > Change-Id: I03e480c8692db3eabfc3a188cce8904d5d962ab7 > Signed-off-by: Oak Zeng >

Re: [PATCH 6/7] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-21 Thread Kuehling, Felix
On 2019-05-15 4:37 p.m., Zeng, Oak wrote: > [CAUTION: External Email] > > Add a new kfd ioctl to allocate queue GWS. Queue > GWS is released on queue destroy. > > Change-Id: I60153c26a577992ad873e4292e759e5c3d5bbd15 > Signed-off-by: Oak Zeng > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

Re: [PATCH 5/7] drm/amdgpu: Add function to add/remove gws to kfd process

2019-05-21 Thread Kuehling, Felix
On 2019-05-15 4:37 p.m., Zeng, Oak wrote: > GWS bo is shared between all kfd processes. Add function to add gws > to kfd process's bo list so gws can be evicted from and restored > for process. > > Change-Id: I75d74cfdadb7075ff8b2b68634e205deb73dc1ea > Signed-off-by: Oak Zeng > --- >

Re: [PATCH 4/7] drm/amdkfd: Add function to set queue gws

2019-05-21 Thread Kuehling, Felix
On 2019-05-15 4:37 p.m., Zeng, Oak wrote: > Add functions in process queue manager to > set/get queue gws. Also set process's number > of gws used. Currently only one queue in > process can use and use all gws. > > Change-Id: I03e480c8692db3eabfc3a188cce8904d5d962ab7 > Signed-off-by: Oak Zeng >

Re: [PATCH v2 4/4] drm/amdkfd: Check against device cgroup

2019-05-17 Thread Kuehling, Felix
Patches 1,2,4 will be submitted through amd-staging-drm-next. Patch 3 goes through the cgroup tree. Patch 4 depends on patch 3. So submitting patch 4 will need to wait until we rebase amd-staging-drm-next on a new enough kernel release that includes patch 3. Patch 1 and 2 could be submitted

Re: Hard lockups with ROCM

2019-05-16 Thread Kuehling, Felix
Hi Daniel, On 2019-05-12 9:44 p.m., Daniel Kasak wrote: > [CAUTION: External Email] > Hi all. I had version 2.2.0 of the ROCM stack running on a 5.0.x and > 5.1.0 kernel. Things were going great with various boinc GPU tasks. > But there is a setiathome GPU task which reliably gives me a hard >

Re: [PATCH 1/2] mm/hmm: support automatic NUMA balancing

2019-05-14 Thread Kuehling, Felix
On 2019-05-13 5:27 p.m., Andrew Morton wrote: > [CAUTION: External Email] > > On Fri, 10 May 2019 19:53:23 + "Kuehling, Felix" > wrote: > >> From: Philip Yang >> >> While the page is migrating by NUMA balancing, HMM failed to detect this

Re: [PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-14 Thread Kuehling, Felix
> which did not have a clear line of sight for 5.2 either. When was that? I saw "Use HMM for userptr" in Dave's 5.2-rc1 pull request to Linus. Regards,   Felix > > Alex > > *From:* amd-gfx on

Re: [PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-13 Thread Kuehling, Felix
[Fixed Alex's email address, sorry for getting it wrong first] On 2019-05-13 3:49 p.m., Jerome Glisse wrote: > [CAUTION: External Email] > > Andrew can we get this 2 fixes line up for 5.2 ? > > On Mon, May 13, 2019 at 07:36:44PM +0000, Kuehling, Felix wrote: >> Hi Jerome,

Re: [PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-13 Thread Kuehling, Felix
7:53:24PM +, Kuehling, Felix wrote: >> Don't set this flag by default in hmm_vma_do_fault. It is set >> conditionally just a few lines below. Setting it unconditionally >> can lead to handle_mm_fault doing a non-blocking fault, returning >> -EBUSY and unlocking mmap_s

Re: [PATCH 7/8] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-13 Thread Kuehling, Felix
On 2019-05-13 12:03 p.m., Zeng, Oak wrote: > Hi Felix, > > See comments inline [Oak] > > Hi Kent, there is one FYI embedded, so be careful when you merge this change > back to kfd-staging branch. > > Regards, > Oak > > -----Original Message- > From: Kue

Re: [PATCH 7/8] drm/amdkfd: New IOCTL to allocate queue GWS

2019-05-10 Thread Kuehling, Felix
On 2019-05-10 12:01 p.m., Zeng, Oak wrote: > Add a new kfd ioctl to allocate queue GWS. Queue > GWS is released on queue destroy. > > Change-Id: I60153c26a577992ad873e4292e759e5c3d5bbd15 > Signed-off-by: Oak Zeng > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 45 >

Re: [PATCH 6/8] drm/amdkfd: Add function to add/remove gws to kfd process

2019-05-10 Thread Kuehling, Felix
The subject for this one should start with drm/amdgpu: because it's a change in amdgpu, not KFD. See two more comments inline. Regards,   Felix On 2019-05-10 12:01 p.m., Zeng, Oak wrote: > [CAUTION: External Email] > > GWS bo is shared between all kfd processes. Add function to add gws > to

[PATCH 1/2] mm/hmm: support automatic NUMA balancing

2019-05-10 Thread Kuehling, Felix
From: Philip Yang While the page is migrating by NUMA balancing, HMM failed to detect this condition and still return the old page. Application will use the new page migrated, but driver pass the old page physical address to GPU, this crash the application later. Use pte_protnone(pte) to return

[PATCH 2/2] mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking

2019-05-10 Thread Kuehling, Felix
Don't set this flag by default in hmm_vma_do_fault. It is set conditionally just a few lines below. Setting it unconditionally can lead to handle_mm_fault doing a non-blocking fault, returning -EBUSY and unlocking mmap_sem unexpectedly. Signed-off-by: Felix Kuehling --- mm/hmm.c | 2 +- 1 file

[PATCH 0/2] Two bug-fixes for HMM

2019-05-10 Thread Kuehling, Felix
These problems were found in AMD-internal testing as we're working on adopting HMM. They are rebased against glisse/hmm-5.2-v3. We'd like to get them applied to a mainline Linux kernel as well as drm-next and amd-staging-drm-next sooner rather than later. Currently the HMM in amd-staging-drm-next

[PATCH 1/1] drm/amdgpu: Improve error handling for HMM

2019-05-07 Thread Kuehling, Felix
Use unsigned long for number of pages. Check that pfns are valid after hmm_vma_fault. If they are not, return an error instead of continuing with invalid page pointers and PTEs. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22 ++ 1 file

Re: [PATCH v15 11/17] drm/amdgpu, arm64: untag user pointers

2019-05-07 Thread Kuehling, Felix
This patch untag user pointers in > amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_ > alloc_memory_of_gpu() for the KFD case. This also makes sure that an > untagged pointer is passed to amdgpu_ttm_tt_get_user_pages(), which uses > it for vma lookups. > > Suggested-by: Kuehling,

Re: [PATCH v15 12/17] drm/radeon, arm64: untag user pointers in radeon_gem_userptr_ioctl

2019-05-07 Thread Kuehling, Felix
On 2019-05-06 12:30 p.m., Andrey Konovalov wrote: > [CAUTION: External Email] > > This patch is a part of a series that extends arm64 kernel ABI to allow to > pass tagged user pointers (with the top byte set to something else other > than 0x00) as syscall arguments. > > In

Re: [PATCH] drm/amdgpu: rename amdgpu_prime.[ch] into amdgpu_dma_buf.[ch]

2019-05-06 Thread Kuehling, Felix
On 2019-05-06 7:24 a.m., Christian König wrote: > [CAUTION: External Email] > > We are getting a dma-buf implementation completely separate from drm prime, > so rename the files now and cleanup the code a bit. > > No functional change. > > Signed-off-by: Christian König Acked-by: Felix Kuehling

[PATCH 1/1] drm/amdgpu: Reserve shared fence for eviction fence

2019-05-06 Thread Kuehling, Felix
Need to reserve space for the shared eviction fence when initializing a KFD VM. Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

Re: [PATCH V2] drm/amdgpu: Fix VM clean check method

2019-05-01 Thread Kuehling, Felix
On 2019-04-30 10:48 a.m., Trigger Huang wrote: > [CAUTION: External Email] > > amdgpu_vm_make_compute is used to turn a GFX VM into a compute VM, > the prerequisite is this VM is clean. Let's check if some page tables > are already filled , while not check if some mapping is already made. > >

Re: [PATCH 2/2] drm/amdkfd: Cosmetic cleanup

2019-05-01 Thread Kuehling, Felix
Thank you Kent! The series is Reviewed-by: Felix Kuehling On 2019-05-01 8:31 a.m., Russell, Kent wrote: > [CAUTION: External Email] > > Fix some spacing issues, log output, uses of !=NULL/==NULL, unneeded > extra lines and clean up a declaration from =1 to =true for clarity > > Change-Id:

Re: [PATCH v14 11/17] drm/amdgpu, arm64: untag user pointers

2019-04-30 Thread Kuehling, Felix
and in > amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu() for the KFD case. > > Suggested-by: Kuehling, Felix > Signed-off-by: Andrey Konovalov > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++ > drivers/gpu/drm/a

Re: [PATCH v14 12/17] drm/radeon, arm64: untag user pointers

2019-04-30 Thread Kuehling, Felix
On 2019-04-30 9:25 a.m., Andrey Konovalov wrote: > [CAUTION: External Email] > > This patch is a part of a series that extends arm64 kernel ABI to allow to > pass tagged user pointers (with the top byte set to something else other > than 0x00) as syscall arguments. > > radeon_ttm_tt_pin_userptr()

Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation

2019-04-30 Thread Kuehling, Felix
On 2019-04-30 1:03 p.m., Koenig, Christian wrote: > Am 30.04.19 um 17:36 schrieb Kuehling, Felix: >> On 2019-04-30 5:32 a.m., Christian König wrote: >>> [CAUTION: External Email] >>> >>> Am 30.04.19 um 01:16 schrieb Kuehling, Felix: >>>> On 2019-

Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation

2019-04-30 Thread Kuehling, Felix
On 2019-04-30 5:32 a.m., Christian König wrote: > [CAUTION: External Email] > > Am 30.04.19 um 01:16 schrieb Kuehling, Felix: >> On 2019-04-29 8:34 a.m., Christian König wrote: >>> Am 28.04.19 um 09:44 schrieb Kuehling, Felix: >>>> From: Kent Russell >

Re: [PATCH] drm/amdgpu: Unmap CSA under SR-IOV in KFD path

2019-04-29 Thread Kuehling, Felix
I remember a past discussion to change the CSA allocation/mapping scheme to avoid this issue in the first place. Can adding the CSA to the VM be delayed a little to a point after the VM gets converted to a compute VM? Maybe the first command submission? Regards,   Felix On 2019-04-28 6:25

Re: [PATCH 00/27] KFD upstreaming

2019-04-29 Thread Kuehling, Felix
,   Felix On 2019-04-28 3:44 a.m., Kuehling, Felix wrote: > Assorted KFD changes that have been accumulating on amd-kfd-staging. New > features and fixes included: > * Support for VegaM > * Support for systems with multiple PCI domains > * New SDMA queue type that's optimized for XGMI lin

Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation

2019-04-29 Thread Kuehling, Felix
On 2019-04-29 8:34 a.m., Christian König wrote: > Am 28.04.19 um 09:44 schrieb Kuehling, Felix: >> From: Kent Russell >> >> GTT size is currently limited to the minimum of VRAM size or 3/4 of >> system memory. This severely limits the quanitity of system memory

[PATCH 20/27] drm/amdkfd: Fix gfx8 MEM_VIOL exception handler

2019-04-28 Thread Kuehling, Felix
From: Jay Cornwall When MEM_VIOL is asserted the context save handler rewinds the program counter. This is incorrect for any source of the exception. MEM_VIOL may be raised in normal operation by out-of-bounds access to LDS or GDS and does not require special handling. Remove PC adjustment when

<    1   2   3   4   5   6   >