Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread Andrey Grodzovsky
You have a job in the pending list which is marked as not finished in drm_sched_stop (https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/scheduler/sched_main.c#L420), s_fence signal cb removed and the job is kept in pending list. Later you will try to manually clear the HW fence of

[PATCH 2/2] drm/amd/pm: enable VR0 HOT support for SMU 13.0.0

2022-06-21 Thread Evan Quan
Enable VR0 Hot support for SMU 13.0.0. Signed-off-by: Evan Quan Acked-by: Alex Deucher Change-Id: I11a642033d6e0885877cf48c1f1e07f30de4622e --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH 1/2] drm/amdgpu: update GFX11 cs settings

2022-06-21 Thread Evan Quan
Update GFX11 cs related settings. Signed-off-by: Evan Quan Acked-by: Alex Deucher Change-Id: If99a46ef4178fb4cd686008038923d3b15efa452 --- drivers/gpu/drm/amd/amdgpu/clearstate_gfx11.h | 307 +- 1 file changed, 158 insertions(+), 149 deletions(-) diff --git

RE: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread VURDIGERENATARAJ, CHANDAN
Hi, Is this a preventive fix or you found errors/oops/hangs? If you had found errors/oops/hangs, can you please share the details? BR, Chandan V N >On 2022-06-21 03:25, Christian König wrote: >> Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: >>> Problem: >>> After we start handling timed out

[linux-next:master] BUILD REGRESSION 34d1d36073ea4d4c532e8c8345627a9702be799e

2022-06-21 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 34d1d36073ea4d4c532e8c8345627a9702be799e Add linux-next specific files for 20220621 Error/Warning reports: https://lore.kernel.org/linux-mm/202206212029.yr5m7cd3-...@intel.com https

Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-21 Thread Andrey Grodzovsky
On 2022-06-21 03:28, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using

Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread Andrey Grodzovsky
On 2022-06-21 03:25, Christian König wrote: Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from

Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-21 Thread Andrey Grodzovsky
On 2022-06-21 03:19, Christian König wrote: Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_fence_put ->

Re: [PATCH] drm/amdkfd: To flush tlb for MMHUB of GFX9 series

2022-06-21 Thread philip yang
On 2022-06-20 22:25, Ji, Ruili wrote: From: Ruili Ji amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:8 pasid:32769, for process test_basic pid 3305 thread test_basic pid 3305) amdgpu: in page starting at address 0x7ff990003000 from IH client 0x12

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread David Hildenbrand
On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: > > On 6/21/2022 7:25 AM, David Hildenbrand wrote: >> On 21.06.22 13:55, Alistair Popple wrote: >>> David Hildenbrand writes: >>> On 21.06.22 13:25, Felix Kuehling wrote: > Am 6/17/22 um 23:19 schrieb David Hildenbrand: >> On

Re: [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-21 Thread Dmitry Osipenko
20.06.2022 18:37, Rob Clark пишет: >> +static unsigned long >> +drm_gem_shmem_shrinker_scan_objects(struct shrinker *shrinker, >> + struct shrink_control *sc) >> +{ >> + unsigned long nr_to_scan = sc->nr_to_scan; >> + bool lock_contention = false; >> +

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Sierra Guiza, Alejandro (Alex)
On 6/21/2022 7:25 AM, David Hildenbrand wrote: On 21.06.22 13:55, Alistair Popple wrote: David Hildenbrand writes: On 21.06.22 13:25, Felix Kuehling wrote: Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM,

Re: [PATCH 2/2] drm/amd/display: fix incorrect comparison in DML

2022-06-21 Thread Alex Deucher
Series is: Acked-by: Alex Deucher On Tue, Jun 21, 2022 at 11:23 AM Aurabindo Pillai wrote: > > [Why] > GCC 12 catches the following incorrect comparison in the if arm > > drivers/gpu/drm/amd/amdgpu/../dal-dev/dc/dml/dcn32/display_mode_vba_32.c: In > function

[PATCH 2/2] drm/amd/display: fix incorrect comparison in DML

2022-06-21 Thread Aurabindo Pillai
[Why] GCC 12 catches the following incorrect comparison in the if arm drivers/gpu/drm/amd/amdgpu/../dal-dev/dc/dml/dcn32/display_mode_vba_32.c: In function ‘dml32_ModeSupportAndSystemConfigurationFull’: drivers/gpu/drm/amd/amdgpu/../dal-dev/dc/dml/dcn32/display_mode_vba_32.c:3740:33: error: the

[PATCH 1/2] drm/amd/display: fix array index in DML

2022-06-21 Thread Aurabindo Pillai
[Why] When the a 3d array is used by indexing with only one dimension in an if condition, the addresses get compared instead of the intended value stored in the array. GCC 12.1 caught this error: drivers/gpu/drm/amd/amdgpu/../dal-dev/dc/dml/dcn32/display_mode_vba_32.c: In function

Re: [PATCH] amd/display/dc: Fix COLOR_ENCODING and COLOR_RANGE doing nothing for DCN20+

2022-06-21 Thread Harry Wentland
On 2022-06-21 09:15, Alex Deucher wrote: > On Mon, Jun 20, 2022 at 8:46 PM VURDIGERENATARAJ, CHANDAN > wrote: >> >> Hi Alex, >> >> I think this was pushed earlier by Harry. >> Not sure why it did not get merged. >> https://www.spinics.net/lists/stable/msg543116.html has the history. > > I

[PATCH] Revert "drm/amdgpu: remove ctx->lock"

2022-06-21 Thread Luben Tuikov
This reverts commit e68efb27647f2106d6b545667f35b2ea39746b57. We see that the bo validate list gets corrupted, in amdgpu_cs_list_validate(), the lobj->tv.bo is NULL. Then getting usermm on the next line, references a NULL bo and we get a koops. Bisecting leads to the commit being reverted as the

Re: [PATCH] drm/amdkfd: To flush tlb for MMHUB of GFX9 series

2022-06-21 Thread Alex Deucher
Acked-by: Alex Deucher On Mon, Jun 20, 2022 at 10:26 PM Ji, Ruili wrote: > > From: Ruili Ji > > amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:8 pasid:32769, > for process test_basic pid 3305 thread test_basic pid 3305) > amdgpu: in page starting at address 0x7ff990003000

Re: [PATCH] drm/amdgpu: vm - drop unexpected word "the" in the comments

2022-06-21 Thread Alex Deucher
Applied. Thanks! On Tue, Jun 21, 2022 at 9:17 AM Jiang Jian wrote: > > there is an unexpected word "the" in the comments that need to be dropped > > file: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > line: 57 > * the kernel tells the the ring what VMID to use for that command > changed to > *

[PATCH] drm/amdgpu: vm - drop unexpected word "the" in the comments

2022-06-21 Thread Jiang Jian
there is an unexpected word "the" in the comments that need to be dropped file: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c line: 57 * the kernel tells the the ring what VMID to use for that command changed to * the kernel tells the ring what VMID to use for that command Signed-off-by: Jiang Jian

Re: [PATCH] amd/display/dc: Fix COLOR_ENCODING and COLOR_RANGE doing nothing for DCN20+

2022-06-21 Thread Alex Deucher
On Mon, Jun 20, 2022 at 8:46 PM VURDIGERENATARAJ, CHANDAN wrote: > > Hi Alex, > > I think this was pushed earlier by Harry. > Not sure why it did not get merged. > https://www.spinics.net/lists/stable/msg543116.html has the history. I guess it never landed? I don't see it in the git history.

Re: [PATCH 01/14] ACPI: video: Add a native function parameter to acpi_video_get_backlight_type()

2022-06-21 Thread Hans de Goede
Hi, On 5/19/22 11:02, Jani Nikula wrote: > On Wed, 18 May 2022, Hans de Goede wrote: >> Hi, >> >> On 5/18/22 10:55, Jani Nikula wrote: >>> On Tue, 17 May 2022, Hans de Goede wrote: ATM on x86 laptops where we want userspace to use the acpi_video backlight device we often register both

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread David Hildenbrand
On 21.06.22 13:55, Alistair Popple wrote: > > David Hildenbrand writes: > >> On 21.06.22 13:25, Felix Kuehling wrote: >>> >>> Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: > On 6/17/2022 12:33 PM, David Hildenbrand wrote: >>

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Alistair Popple
David Hildenbrand writes: > On 21.06.22 13:25, Felix Kuehling wrote: >> >> Am 6/17/22 um 23:19 schrieb David Hildenbrand: >>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: > On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex)

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread David Hildenbrand
On 21.06.22 13:25, Felix Kuehling wrote: > > Am 6/17/22 um 23:19 schrieb David Hildenbrand: >> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: >>> On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: > On 6/17/2022 4:40 AM,

Re: [PATCH v5 01/13] mm: add zone device coherent type memory support

2022-06-21 Thread Felix Kuehling
Am 6/17/22 um 23:19 schrieb David Hildenbrand: On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 12:33 PM, David Hildenbrand wrote: On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: On 6/17/2022 4:40 AM, David Hildenbrand wrote: On 31.05.22 22:00, Alex Sierra

Re: Using generic fbdev helpers breaks hibernation

2022-06-21 Thread Thomas Zimmermann
Hi Am 21.06.22 um 00:02 schrieb Alex Deucher: Maybe someone more familiar with the generic drm fbdev helpers can help me understand why they don't work with hibernation, at least with AMD GPUs. We converted amdgpu to use the generic helpers instead of rolling our own in this patch[1], but it

Re: [PATCH 5/5] drm/amdgpu: Follow up change to previous drm scheduler change.

2022-06-21 Thread Christian König
Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence

Re: [PATCH 4/5] drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'

2022-06-21 Thread Christian König
Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: This patch caused negative refcount as described in [1] because for that case parent fence did not signal by the time of drm_sched_stop and hence kept in pending list the assumption was they will not signal and so fence was put to account

Re: [PATCH 3/5] drm/amdgpu: Prevent race between late signaled fences and GPU reset.

2022-06-21 Thread Christian König
Am 21.06.22 um 00:03 schrieb Andrey Grodzovsky: Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU

Re: [PATCH 2/5] drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences

2022-06-21 Thread Christian König
Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_process. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König ---

Re: [PATCH 1/5] drm/amdgpu: Fix possible refcount leak for release of external_hw_fence

2022-06-21 Thread Christian König
Am 21.06.22 um 00:02 schrieb Andrey Grodzovsky: Problem: In amdgpu_job_submit_direct - The refcount should drop by 2 but it drops only by 1. amdgpu_ib_sched->emit -> refcount 1 from first fence init dma_fence_get -> refcount 2 dme_fence_put -> refcount 1 Fix: Add put for external_hw_fence in