RE: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread Huang, Ray
[AMD Official Use Only - Internal Distribution Only] This workaround is to fix the s3 issue with video playback on raven1 before. Changfeng, can you have a quick test whether we don't need it right now? Thanks, Ray -Original Message- From: Zhu, Changfeng Sent: Wednesday, April 15, 2

RE: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only] This actually introduced at very early stage we enabled GFXOFF for the first time on Raven platform. At that time gfxoff can't work with Video play back (this is general issue across all OSes). So we disabled gfxoff when there is workload on

Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

2020-04-14 Thread Deucher, Alexander
[AMD Public Use] Some good advice on getting ioctls right: https://www.kernel.org/doc/html/v5.4-preprc-cpu/ioctl/botching-up-ioctls.html Alex From: amd-gfx on behalf of Felix Kuehling Sent: Tuesday, April 14, 2020 10:40 PM To: Lin, Amber ; amd-gfx@lists.freede

Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

2020-04-14 Thread Felix Kuehling
Hi Amber, I understand that different processes can get the same FD. My statement about FD being unique is relative to one process. The main problem with the global client ID is, that it allows process A to change the event mask of process B just by specifying process B's client ID. That can lead

RE: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread Zhu, Changfeng
[AMD Official Use Only - Internal Distribution Only] +Ray BR, Changfeng. -Original Message- From: Zhu, James Sent: Tuesday, April 14, 2020 11:00 PM To: Alex Deucher ; Zhu, James ; Zhang, Hawking Cc: amd-gfx list ; Zhu, Changfeng Subject: Re: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

RE: [PATCH v4] drm/amdkfd: Provide SMI events watch

2020-04-14 Thread Lin, Amber
[AMD Official Use Only - Internal Distribution Only] Hi Felix, That was my assumption too that each registration will get different file descriptor, but it turns out not. When I started two process and both register gpu0 and gpu1, they both got fd=15. If I have process A register gpu0+gpu1, and

Re: [PATCH] drm/amd/display: Fix pageflip event race condition for DCN. (v2)

2020-04-14 Thread Matt Coffin
Hey everyone, This patch broke variable refresh rate in games (all that I've tried so far... Project CARS 2, DiRT Rally 2.0, Assetto Corsa Competizione) as well as a simple freesync tester application. FreeSync tester I've been using: https://github.com/Nixola/VRRTest I'm not at all familiar wit

Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

2020-04-14 Thread Felix Kuehling
Hi Amber, Some general remarks about the multi-client support. You added a global client id that's separate from the file descriptor. That's problematic for two reasons: 1. A process could change a different process' event mask 2. The FD should already be unique per process, no need to invent

[PATCH v4] drm/amdkfd: Provide SMI events watch

2020-04-14 Thread Amber Lin
When the compute is malfunctioning or performance drops, the system admin will use SMI (System Management Interface) tool to monitor/diagnostic what went wrong. This patch provides an event watch interface for the user space to register devices and subscribe events they are interested. After regist

RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Kim, Jonathan
[AMD Official Use Only - Internal Distribution Only] If we're passing the test on the revert, then the only thing that's different is we're not invalidating HDP and doing a copy to host anymore in amdgpu_device_vram_access since the function is still called in ttm access_memory with BAR. Also

Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Felix Kuehling
I wouldn't call it premature. Revert is a usual practice when there is a serious regression that isn't fully understood or root-caused. As far as I can tell, the problem has been reproduced on multiple systems, different GPUs, and clearly regressed to Christian's commit. I think that justifies reve

Re: [PATCH] Optimized division operation to shift operation

2020-04-14 Thread Alex Deucher
On Tue, Apr 14, 2020 at 9:05 AM Bernard Zhao wrote: > > On some processors, the / operate will call the compiler`s div lib, > which is low efficient, We can replace the / operation with shift, > so that we can replace the call of the division library with one > shift assembly instruction. > > Sign

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Daniel Vetter
On Tue, Apr 14, 2020 at 4:29 PM Kenny Ho wrote: > > On Tue, Apr 14, 2020 at 10:04 AM Daniel Vetter wrote: > > > > This has _nothing_ to do with Intel (I think over the past 25 years or > > so intel has implemented all 4 versions of gpu splitting that I > > listed, but not entirely sure). > > > >

Re: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread James Zhu
+Hawking Hi Hawking, can we drop this WA? Thanks! James On 2020-04-14 10:52 a.m., James Zhu wrote: +Rex This is introduce by below patch. commit 3fded222f4bf7f4c56ef4854872a39a4de08f7a8 Author: Rex Zhu Date:   Fri Jul 27 17:00:02 2018 +0800     drm/amdgpu: Disable gfx off if VCN is busy

Re: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread James Zhu
+Rex This is introduce by below patch. commit 3fded222f4bf7f4c56ef4854872a39a4de08f7a8 Author: Rex Zhu Date:   Fri Jul 27 17:00:02 2018 +0800     drm/amdgpu: Disable gfx off if VCN is busy     this patch is a workaround for the gpu hang     at video begin/end time if gfx off is enabled.    

RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Kim, Jonathan
[AMD Official Use Only - Internal Distribution Only] I think it's premature to push this revert. With more testing, I'm getting failures from different tests or sometimes none at all on my machine. Kent, let's continue the discussion on the original thread. Thanks, Jon From: Koenig, Christia

Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Koenig, Christian
That's exactly my concern as well. This looks a bit like the test creates erroneous data somehow, but there doesn't seems to be a RAS check in the MM data path. And now that we use the BAR path it goes up in flames. I just don't see how we can create erroneous data in a test case? Christian.

Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Deucher, Alexander
[AMD Public Use] If this causes an issue, any access to vram via the BAR could cause an issue. Alex From: amd-gfx on behalf of Russell, Kent Sent: Tuesday, April 14, 2020 10:19 AM To: Koenig, Christian ; amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix ; Kim

Re: [PATCH] drm/scheduler: fix drm_sched_get_cleanup_job

2020-04-14 Thread Andrey Grodzovsky
Reviewed-by: Andrey Grodzovsky Andrey On 4/14/20 10:22 AM, Kent Russell wrote: From: Christian König We are racing to initialize sched->thread here, just always check the current thread. Signed-off-by: Christian Koenig Reviewed-by: Kent Russell --- drivers/gpu/drm/scheduler/sched_main.c

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
On Tue, Apr 14, 2020 at 10:04 AM Daniel Vetter wrote: > > This has _nothing_ to do with Intel (I think over the past 25 years or > so intel has implemented all 4 versions of gpu splitting that I > listed, but not entirely sure). > > So again pls less tribal fighting, more collaboration. If you can

Re: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread Alex Deucher
On Tue, Apr 14, 2020 at 8:05 AM James Zhu wrote: > > Turn off gfxoff control when vcn is gated. > > Signed-off-by: James Zhu > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c >

[PATCH] drm/scheduler: fix drm_sched_get_cleanup_job

2020-04-14 Thread Kent Russell
From: Christian König We are racing to initialize sched->thread here, just always check the current thread. Signed-off-by: Christian Koenig Reviewed-by: Kent Russell --- drivers/gpu/drm/scheduler/sched_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/

RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Russell, Kent
[AMD Official Use Only - Internal Distribution Only] On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separa

RE: [PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread Zhu, Changfeng
[AMD Official Use Only - Internal Distribution Only] Tested-by: changzhu BR, Changfeng. -Original Message- From: Zhu, James Sent: Tuesday, April 14, 2020 8:05 PM To: amd-gfx@lists.freedesktop.org Cc: Zhu, James ; Zhu, Changfeng Subject: [PATCH] drm/amdgpu/vcn: fix gfxoff issue Turn

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Daniel Vetter
On Tue, Apr 14, 2020 at 3:50 PM Kenny Ho wrote: > > Hi Daniel, > > I appreciate many of your review so far and I much prefer keeping > things technical but that is very difficult to do when I get Intel > developers calling my implementation "most AMD-specific solution > possible" and objecting to

Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Christian König
Am 13.04.20 um 20:20 schrieb Kent Russell: This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. The original patch causes a RAS event and subsequent kernel hard-hang when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and Arcturus dmesg output at hang time: [drm] RAS event

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
Hi Daniel, I appreciate many of your review so far and I much prefer keeping things technical but that is very difficult to do when I get Intel developers calling my implementation "most AMD-specific solution possible" and objecting to an implementation because their hardware cannot support it. C

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Daniel Vetter
On Tue, Apr 14, 2020 at 3:14 PM Kenny Ho wrote: > > Ok. I was hoping you can clarify the contradiction between the > existance of the spec below and your "not something any other gpu can > reasonably support" statement. I mean, OneAPI is Intel's spec and > doesn't that at least make SubDevice su

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
Ok. I was hoping you can clarify the contradiction between the existance of the spec below and your "not something any other gpu can reasonably support" statement. I mean, OneAPI is Intel's spec and doesn't that at least make SubDevice support "reasonable" for one more vendor? Partisanship aside

Re: [regression 5.7-rc1] System does not power off, just halts

2020-04-14 Thread Alex Deucher
On Tue, Apr 14, 2020 at 4:21 AM Greg KH wrote: > > On Mon, Apr 13, 2020 at 01:48:58PM -0400, Alex Deucher wrote: > > On Mon, Apr 13, 2020 at 1:47 PM Paul Menzel wrote: > > > > > > Dear Prike, dear Alex, dear Linux folks, > > > > > > > > > Am 13.04.20 um 10:44 schrieb Paul Menzel: > > > > > > > A

Re: [regression 5.7-rc1] System does not power off, just halts

2020-04-14 Thread Greg KH
On Mon, Apr 13, 2020 at 01:48:58PM -0400, Alex Deucher wrote: > On Mon, Apr 13, 2020 at 1:47 PM Paul Menzel wrote: > > > > Dear Prike, dear Alex, dear Linux folks, > > > > > > Am 13.04.20 um 10:44 schrieb Paul Menzel: > > > > > A regression between causes a system with the AMD board MSI B350M MORT

Re: [PATCH 2/6] i915/gvt/kvm: a NULL ->mm does not mean a thread is a kthread

2020-04-14 Thread Yan Zhao
On Tue, Apr 14, 2020 at 09:00:13AM +0200, Christoph Hellwig wrote: > On Mon, Apr 13, 2020 at 08:04:10PM -0400, Yan Zhao wrote: > > > I can't think of another way for a kernel thread to have a mm indeed. > > for example, before calling to vfio_dma_rw(), a kernel thread has already > > called use_mm(

[PATCH] Optimized division operation to shift operation

2020-04-14 Thread Bernard Zhao
On some processors, the / operate will call the compiler`s div lib, which is low efficient, We can replace the / operation with shift, so that we can replace the call of the division library with one shift assembly instruction. Signed-off-by: Bernard Zhao --- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Daniel Vetter
On Tue, Apr 14, 2020 at 2:47 PM Kenny Ho wrote: > On Tue, Apr 14, 2020 at 8:20 AM Daniel Vetter wrote: > > My understanding from talking with a few other folks is that > > the cpumask-style CU-weight thing is not something any other gpu can > > reasonably support (and we have about 6+ of those in

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Kenny Ho
Hi Daniel, On Tue, Apr 14, 2020 at 8:20 AM Daniel Vetter wrote: > My understanding from talking with a few other folks is that > the cpumask-style CU-weight thing is not something any other gpu can > reasonably support (and we have about 6+ of those in-tree) How does Intel plan to support the Su

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-14 Thread Daniel Vetter
On Mon, Apr 13, 2020 at 03:11:36PM -0400, Tejun Heo wrote: > Hello, Kenny. > > On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote: > > Can you elaborate more on what are the missing pieces? > > Sorry about the long delay, but I think we've been going in circles for quite > a while now. Let'

[PATCH] drm/amdgpu/vcn: fix gfxoff issue

2020-04-14 Thread James Zhu
Turn off gfxoff control when vcn is gated. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index dab34f6..aa9a7a5 10

RE: [PATCH] drm/amdgpu: cache smu fw version info

2020-04-14 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking From: Clements, John Sent: Tuesday, April 14, 2020 15:54 To: amd-gfx@lists.freedesktop.org; Zhang, Hawking Subject: [PATCH] drm/amdgpu: cache smu fw version info [AMD Official Use Only - Internal

RE: [PATCH] drm/amdgpu: cache smu fw version info

2020-04-14 Thread Quan, Evan
Reviewed-by: Evan Quan From: amd-gfx On Behalf Of Clements, John Sent: Tuesday, April 14, 2020 3:54 PM To: amd-gfx@lists.freedesktop.org; Zhang, Hawking Subject: [PATCH] drm/amdgpu: cache smu fw version info [AMD Official Use Only - Internal Distribution Only] Submitting patch to save smu f

RE: [PATCH] series to refactor psp np fw loading

2020-04-14 Thread Quan, Evan
Hi John, Please limit this to RAS triggered gpu reset only. if (adev->in_gpu_reset) { As for non-RAS triggered gpu reset, smu fw reloading is not needed. Regards, Evan From: amd-gfx On Behalf Of Clements, John Sent: Tuesday, April 14, 2020 3:06 PM To: amd-gfx@lists.freedesktop.org; Zhang, Hawki

[PATCH] drm/amdgpu: cache smu fw version info

2020-04-14 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only] Submitting patch to save smu fw version in local smu context to avoid multiple erroneous submissions to smu when requesting smu version info Thank you, John Clements 0001-drm-amdgpu-cache-smu-fw-version-info.patch Description: 0001-drm-amdgp

RE: [PATCH] series to refactor psp np fw loading

2020-04-14 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only] Series is Reviewed-by: Hawking Zhang Regards, Hawking From: Clements, John Sent: Tuesday, April 14, 2020 15:06 To: amd-gfx@lists.freedesktop.org; Zhang, Hawking Subject: [PATCH] series to refactor psp np fw loading [AMD Official Use Only

Re: [PATCH 2/6] i915/gvt/kvm: a NULL ->mm does not mean a thread is a kthread

2020-04-14 Thread Christoph Hellwig
On Mon, Apr 13, 2020 at 08:04:10PM -0400, Yan Zhao wrote: > > I can't think of another way for a kernel thread to have a mm indeed. > for example, before calling to vfio_dma_rw(), a kernel thread has already > called use_mm(), then its current->mm is not null, and it has flag > PF_KTHREAD. > in thi

[PATCH] series to refactor psp np fw loading

2020-04-14 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only] Submitting patch series to refactor psp np fw loading sequence and set MP1 state to unload in preparation for SMU FW loading during a GPU reset Thank you, John Clements 0001-drm-amdgpu-update-psp-fw-loading-sequence.patch Description: 0001-d