[PATCH] drm/amdgpu: improve HMM error -ENOMEM and -EBUSY handling

2019-06-14 Thread Yang, Philip
Under memory pressure, hmm_range_fault may return error code -ENOMEM or -EBUSY, change pr_info to pr_debug to remove unnecessary kernel log message because we will retry restore again. Call get_user_pages_done if TTM get user pages failed will have WARN_ONCE kernel calling stack dump log.

Re: [PATCH 1/4] Revert "drm/amdkfd: Fix sdma queue allocate race condition"

2019-06-14 Thread Yang, Philip
I just figured out previous patch have issue. New patch is simple and looks good to me. This series is Reviewed-by: Philip.Yang On 2019-06-14 9:27 p.m., Zeng, Oak wrote: > This reverts commit 0a7c7281bdaae8cf63d77be26a4b46128114bdec. > This fix is not proper. allocate_mqd can't be moved before

[PATCH 2/4] Revert "drm/amdkfd: Fix a circular lock dependency"

2019-06-14 Thread Zeng, Oak
This reverts commit 49b7f386343b4da9d9b14d97061c34fdd3dd2628. This fix is not proper. allocate_mqd can't be moved before allocate_sdma_queue as it depends on q->properties->sdma_id set in later. Change-Id: Ia99ec628e9df5abdf4c4c730e57d40cab0b6a4ad Signed-off-by: Oak Zeng ---

[PATCH 1/4] Revert "drm/amdkfd: Fix sdma queue allocate race condition"

2019-06-14 Thread Zeng, Oak
This reverts commit 0a7c7281bdaae8cf63d77be26a4b46128114bdec. This fix is not proper. allocate_mqd can't be moved before allocate_sdma_queue as it depends on q->properties->sdma_id set in later. Change-Id: If4934afebda8cf37dfcde9b50ce53643d526584d Signed-off-by: Oak Zeng ---

[PATCH 3/4] drm/amdkfd: Fix a circular lock dependency

2019-06-14 Thread Zeng, Oak
The idea to break the circular lock dependency is to temporarily drop dqm lock before calling allocate_mqd. See callstack #1 below. [ 59.510149] [drm] Initialized amdgpu 3.30.0 20150101 for :04:00.0 on minor 0 [ 513.604034] == [

[PATCH 4/4] drm/amdkfd: Fix sdma queue allocate race condition

2019-06-14 Thread Zeng, Oak
SDMA queue allocation requires the dqm lock as it modify the global dqm members. Enclose it in the dqm_lock. Change-Id: I2fd37a60613c06333e08fcfe90b6ddb367ea43ee Signed-off-by: Oak Zeng --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 7 ++- 1 file changed, 6 insertions(+), 1

Re: [PATCH 06/59] drm/prime: Actually remove DRIVER_PRIME everywhere

2019-06-14 Thread Sam Ravnborg
Hi Daniel. Minor nitpick.. > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 65d599065709..4fd09a9ad67a 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -3193,7 +3193,7 @@ static struct drm_driver driver = { >

[PATCH 51/59] drm/radeon: Fill out gem_object->resv

2019-06-14 Thread Daniel Vetter
That way we can ditch our gem_prime_res_obj implementation. Since ttm absolutely needs the right reservation object all the boilerplate is already there and we just have to wire it up correctly. Note that gem/prime doesn't care when we do this, as long as we do it before the bo is registered and

[PATCH 09/59] drm/prime: Align gem_prime_export with obj_funcs.export

2019-06-14 Thread Daniel Vetter
The idea is that gem_prime_export is deprecated in favor of obj_funcs.export. That's much easier to do if both have matching function signatures. Signed-off-by: Daniel Vetter Cc: Russell King Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Sean Paul Cc: David Airlie Cc: Daniel Vetter Cc:

[PATCH 06/59] drm/prime: Actually remove DRIVER_PRIME everywhere

2019-06-14 Thread Daniel Vetter
Split out to make the functional changes stick out more. v2: amdgpu gained DRIVER_SYNCOBJ_TIMELINE. v3: amdgpu lost DRIVER_SYNCOBJ_TIMELINE. Signed-off-by: Daniel Vetter Cc: amd-gfx@lists.freedesktop.org Cc: etna...@lists.freedesktop.org Cc: freedr...@lists.freedesktop.org Cc:

Re: [PATCH] drm/amdgpu: extend AMDGPU_CTX_PRIORITY_NORMAL comment

2019-06-14 Thread Koenig, Christian
Am 14.06.19 um 19:33 schrieb Emil Velikov: > From: Emil Velikov > > Currently the AMDGPU_CTX_PRIORITY_* defines are used in both > drm_amdgpu_ctx_in::priority and drm_amdgpu_sched_in::priority. > > Extend the comment to mention the CAP_SYS_NICE or DRM_MASTER requirement > is only applicable with

[PATCH] drm/amdgpu: extend AMDGPU_CTX_PRIORITY_NORMAL comment

2019-06-14 Thread Emil Velikov
From: Emil Velikov Currently the AMDGPU_CTX_PRIORITY_* defines are used in both drm_amdgpu_ctx_in::priority and drm_amdgpu_sched_in::priority. Extend the comment to mention the CAP_SYS_NICE or DRM_MASTER requirement is only applicable with the former. Cc: Bas Nieuwenhuizen Cc: Christian König

[PATCH] drm/amd/amdgpu: cast mem->num_pages to 64-bits when shifting

2019-06-14 Thread StDenis, Tom
On 32-bit hosts mem->num_pages is 32-bits and can overflow when shifted. Add a cast to avoid this. Signed-off-by: Tom St Denis --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Emil Velikov
On 2019/06/14, Koenig, Christian wrote: > Am 14.06.19 um 17:53 schrieb Emil Velikov: > > On 2019/06/14, Koenig, Christian wrote: > >> Am 14.06.19 um 14:09 schrieb Emil Velikov: > >>> On 2019/05/27, Emil Velikov wrote: > >>> [SNIP] > >>> Hi Christian, > >>> > >>> > >>> In the following, I would

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Koenig, Christian
Am 14.06.19 um 17:53 schrieb Emil Velikov: > On 2019/06/14, Koenig, Christian wrote: >> Am 14.06.19 um 14:09 schrieb Emil Velikov: >>> On 2019/05/27, Emil Velikov wrote: >>> [SNIP] >>> Hi Christian, >>> >>> >>> In the following, I would like to summarise and emphasize the need for >>> DRM_AUTH

Re: [pull] amdgpu drm-fixes-5.2

2019-06-14 Thread Daniel Vetter
On Fri, Jun 14, 2019 at 5:49 PM Daniel Vetter wrote: > > On Wed, Jun 12, 2019 at 09:18:56PM -0500, Alex Deucher wrote: > > Hi Dave, Daniel, > > > > Fixes for 5.2: > > - Extend previous vce fix for resume to uvd and vcn > > - Fix bounds checking in ras debugfs interface > > - Fix a regression on

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Emil Velikov
On 2019/06/14, Koenig, Christian wrote: > Am 14.06.19 um 14:09 schrieb Emil Velikov: > > On 2019/05/27, Emil Velikov wrote: > > [SNIP] > > Hi Christian, > > > > > > In the following, I would like to summarise and emphasize the need for > > DRM_AUTH removal. I would kindly ask you to spend a couple

Re: [pull] amdgpu drm-fixes-5.2

2019-06-14 Thread Daniel Vetter
On Wed, Jun 12, 2019 at 09:18:56PM -0500, Alex Deucher wrote: > Hi Dave, Daniel, > > Fixes for 5.2: > - Extend previous vce fix for resume to uvd and vcn > - Fix bounds checking in ras debugfs interface > - Fix a regression on SI using amdgpu > > The following changes since commit

Re: [PATCH] drm/amd: fix hotplug race at startup

2019-06-14 Thread Daniel Vetter
On Fri, Jun 14, 2019 at 07:29:23PM +0800, Young Xiao wrote: > We should check mode_config_initialized flag in amdgpu_hotplug_work_func. > > See commit 7f98ca454ad3 ("drm/radeon: fix hotplug race at startup") for > details. > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > --- >

[PATCH] drm/amd: fix hotplug race at startup

2019-06-14 Thread Young Xiao
We should check mode_config_initialized flag in amdgpu_hotplug_work_func. See commit 7f98ca454ad3 ("drm/radeon: fix hotplug race at startup") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 3 +++ 1 file changed, 3 insertions(+) diff

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Michel Dänzer
On 2019-06-14 2:55 p.m., Koenig, Christian wrote: > Am 14.06.19 um 14:09 schrieb Emil Velikov: > >> That said, the proposal will not conflict with the DRM_AUTH removal. If >> anything it is step 0.5 of the grand master plan. > > That's the point I strongly disagree on. > > By lowering the

Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset

2019-06-14 Thread Yang, Philip
Hi Emily, I am not familiar with vbios and driver init part, just based on my experience, the patch don't modify amdgpu_get_bios but move amdgpu_get_bios to amdgpu_device_ip_early_init from amdgpu_device_init, so amdgpu_get_bios is executed earlier. The kernel error message "BUG: kernel NULL

Re: [PATCH xf86-video-ati 3/3] Remove dri2_drawable_crtc parameter consider_disabled

2019-06-14 Thread Deucher, Alexander
Series is: Reviewed-by: Alex Deucher From: amd-gfx on behalf of Michel Dänzer Sent: Friday, June 14, 2019 5:27 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH xf86-video-ati 3/3] Remove dri2_drawable_crtc parameter consider_disabled From: Michel Dänzer

Re: [PATCH] drm/amd/powerplay: detect version of smu backend

2019-06-14 Thread William Lewis
On 6/14/19 2:01 AM, Prike Liang wrote: > Change-Id: Ib050c8cf0c2c5af4c1f747cf596860f9be01a2d3 > Signed-off-by: Prike Liang > --- > drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c | 1 + > drivers/gpu/drm/amd/powerplay/inc/hwmgr.h | 1 + >

Re: [PATCH] drm/amd/powerplay: detect version of smu backend

2019-06-14 Thread Deucher, Alexander
Reviewed-by: Alex Deucher From: amd-gfx on behalf of Prike Liang Sent: Friday, June 14, 2019 3:01 AM To: amd-gfx@lists.freedesktop.org Cc: Liang, Prike; Huang, Ray; Feng, Kenneth; Quan, Evan Subject: [PATCH] drm/amd/powerplay: detect version of smu backend

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Koenig, Christian
Am 14.06.19 um 14:09 schrieb Emil Velikov: > On 2019/05/27, Emil Velikov wrote: > [SNIP] > Hi Christian, > > > In the following, I would like to summarise and emphasize the need for > DRM_AUTH removal. I would kindly ask you to spend a couple of minutes > extra reading it. > > > Today DRM drivers*

Re: [PATCH v2] drm: radeon: add a missing break in evergreen_cs_handle_reg

2019-06-14 Thread Mathieu Malaterre
On Thu, Jan 17, 2019 at 9:40 PM Mathieu Malaterre wrote: > > In commit dd220a00e8bd ("drm/radeon/kms: add support for streamout v7") > case statements were added without a terminating break statement. This > commit adds the missing break. This was discovered during a compilation > with W=1. > >

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Emil Velikov
On 2019/05/27, Emil Velikov wrote: > From: Emil Velikov > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the > render node. A seemingly deliberate design decision. > > Hence we can drop the DRM_AUTH all together (details in follow-up patch) > yet not all userspace checks

[PATCH xf86-video-ati 3/3] Remove dri2_drawable_crtc parameter consider_disabled

2019-06-14 Thread Michel Dänzer
From: Michel Dänzer All callers were passing TRUE. (Ported from amdgpu commit ea19a5207054bb159fc7fb6d88e0ceb10c3da010) Signed-off-by: Michel Dänzer --- src/radeon_dri2.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/radeon_dri2.c b/src/radeon_dri2.c

[PATCH xf86-video-ati 1/3] dri2: reply to client for WaitMSC request in any case

2019-06-14 Thread Michel Dänzer
From: Flora Cui otherwise client would wait for reply forever and desktop appears hang. Signed-off-by: Flora Cui (Ported from amdgpu commit fb06fb814700a47464abd756edcc76d0d776) Signed-off-by: Michel Dänzer --- src/radeon_dri2.c | 3 +++ 1 file changed, 3 insertions(+) diff --git

[PATCH xf86-video-ati 2/3] dri2: Re-use previous CRTC when possible if pick_best_crtc returns NULL

2019-06-14 Thread Michel Dänzer
From: Michel Dänzer This way, the MSC will continue ticking at the rate of (the last mode which was enabled for) that CRTC, instead of the client running unthrottled. (Ported from amdgpu commit 3109f088fdbd89c2ee8078625d4f073852492656) Signed-off-by: Michel Dänzer --- src/radeon_dri2.c | 6

Re: [PATCH] drm/amdgpu: Fix connector atomic_check compilation fail

2019-06-14 Thread Maarten Lankhorst
Op 14-06-2019 om 02:27 schreef Sean Paul: > From: Sean Paul > > I missed amdgpu in my connnector_helper_funcs->atomic_check conversion, > which is understandably causing compilation failures. > > Fixes: 6f3b62781bbd ("drm: Convert connector_helper_funcs->atomic_check to > accept

[PATCH v3 hmm 05/12] mm/hmm: Remove duplicate condition test before wait_event_timeout

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe The wait_event_timeout macro already tests the condition as its first action, so there is no reason to open code another version of this, all that does is skip the might_sleep() debugging in common cases, which is not helpful. Further, based on prior patches, we can now

[PATCH v3 hmm 09/12] mm/hmm: Poison hmm_range during unregister

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe Trying to misuse a range outside its lifetime is a kernel bug. Use poison bytes to help detect this condition. Double unregister will reliably crash. Signed-off-by: Jason Gunthorpe Reviewed-by: Jérôme Glisse Reviewed-by: John Hubbard Acked-by: Souptick Joarder

[PATCH v3 hmm 03/12] mm/hmm: Hold a mmgrab from hmm to mm

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe So long as a struct hmm pointer exists, so should the struct mm it is linked too. Hold the mmgrab() as soon as a hmm is created, and mmdrop() it once the hmm refcount goes to zero. Since mmdrop() (ie a 0 kref on struct mm) is now impossible with a !NULL mm->hmm delete the

[PATCH v3 hmm 07/12] mm/hmm: Use lockdep instead of comments

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe So we can check locking at runtime. Signed-off-by: Jason Gunthorpe Reviewed-by: Jérôme Glisse Reviewed-by: John Hubbard Reviewed-by: Ralph Campbell Acked-by: Souptick Joarder Tested-by: Philip Yang --- v2 - Fix missing & in lockdeps (Jason) --- mm/hmm.c | 4 ++-- 1

[PATCH v3 hmm 11/12] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe hmm_release() is called exactly once per hmm. ops->release() cannot accidentally trigger any action that would recurse back onto hmm->mirrors_sem. This fixes a use after-free race of the form: CPU0 CPU1

[PATCH v3 hmm 02/12] mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_register

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe Ralph observes that hmm_range_register() can only be called by a driver while a mirror is registered. Make this clear in the API by passing in the mirror structure as a parameter. This also simplifies understanding the lifetime model for struct hmm, as the hmm pointer must

[PATCH v3 hmm 01/12] mm/hmm: fix use after free with struct hmm in the mmu notifiers

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe mmu_notifier_unregister_no_release() is not a fence and the mmu_notifier system will continue to reference hmm->mn until the srcu grace period expires. Resulting in use after free races like this: CPU0 CPU1

[PATCH v3 hmm 12/12] mm/hmm: Fix error flows in hmm_invalidate_range_start

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe If the trylock on the hmm->mirrors_sem fails the function will return without decrementing the notifiers that were previously incremented. Since the caller will not call invalidate_range_end() on EAGAIN this will result in notifiers becoming permanently incremented and

Re: [PATCH v2 hmm 00/11] Various revisions from a locking/code review

2019-06-14 Thread Jason Gunthorpe
On Wed, Jun 12, 2019 at 09:49:12PM +, Yang, Philip wrote: > Rebase to https://github.com/jgunthorpe/linux.git hmm branch, need some > changes because of interface hmm_range_register change. Then run a quick > amdgpu_test. Test is finished, result is ok. Great! Thanks I'll add your

[PATCH v3 hmm 06/12] mm/hmm: Hold on to the mmget for the lifetime of the range

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe Range functions like hmm_range_snapshot() and hmm_range_fault() call find_vma, which requires hodling the mmget() and the mmap_sem for the mm. Make this simpler for the callers by holding the mmget() inside the range for the lifetime of the range. Other functions that

[PATCH v3 hmm 00/12] mm/hmm: Various revisions from a locking/code review

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe This patch series arised out of discussions with Jerome when looking at the ODP changes, particularly informed by use after free races we have already found and fixed in the ODP code (thanks to syzkaller) working with mmu notifiers, and the discussion with Ralph on how to

[PATCH v3 hmm 04/12] mm/hmm: Simplify hmm_get_or_create and make it reliable

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe As coded this function can false-fail in various racy situations. Make it reliable and simpler by running under the write side of the mmap_sem and avoiding the false-failing compare/exchange pattern. Due to the mmap_sem this no longer has to avoid racing with a 2nd parallel

[PATCH v3 hmm 10/12] mm/hmm: Do not use list*_rcu() for hmm->ranges

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe This list is always read and written while holding hmm->lock so there is no need for the confusing _rcu annotations. Signed-off-by: Jason Gunthorpe Reviewed-by: Jérôme Glisse Reviewed-by: John Hubbard Acked-by: Souptick Joarder Reviewed-by: Ralph Campbell Acked-by:

[PATCH v3 hmm 08/12] mm/hmm: Remove racy protection against double-unregistration

2019-06-14 Thread Jason Gunthorpe
From: Jason Gunthorpe No other register/unregister kernel API attempts to provide this kind of protection as it is inherently racy, so just drop it. Callers should provide their own protection, it appears nouveau already does, but just in case drop a debugging POISON. Signed-off-by: Jason

[PATCH] drm/amd/powerplay: detect version of smu backend

2019-06-14 Thread Prike Liang
Change-Id: Ib050c8cf0c2c5af4c1f747cf596860f9be01a2d3 Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c | 1 + drivers/gpu/drm/amd/powerplay/inc/hwmgr.h | 1 + drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c| 1 +