Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-01 Thread Guilherme G. Piccoli
On 01/02/2023 13:21, Luben Tuikov wrote: > Hi Guilherme, > > Since setting sched->ready to false, seems to be taking place in, directly > amdgpu_ring_fini() > and in amdgpu_fence_driver_sw_fini() indirectly as that function calls > drm_sched_fini() > which sets it to false, we seem to have two

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-01 Thread Luben Tuikov
;>>>> Hi Piccoli, >>>>> >>>>> Please ignore my request of full dmesg log. I can reproduce the issue and >>>>> get the same failure callstack by returning early with an error code >>>>> prior to amdgpu_device_init_schedulers.

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-01 Thread Christian König
c: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui ; dri-devel@lists.freedesktop.org; Tuikov, Luben ; Limonciello, Mario ; kernel-...@igalia.com; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini Am 31.01.23 um 10:17 schrieb Chen, Guchu

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-01 Thread Alex Deucher
tian > > Sent: Tuesday, January 31, 2023 6:59 PM > > To: Chen, Guchun ; Alex Deucher > > ; Guilherme G. Piccoli > > Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui > > ; dri-devel@lists.freedesktop.org; Tuikov, Luben > > ; Limonciello, M

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Christian König
amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui ; dri-devel@lists.freedesktop.org; Tuikov, Luben ; Limonciello, Mario ; kernel-...@igalia.com; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini Am 31.01.23 um 10:17 schrieb C

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Alex Deucher
On Tue, Jan 31, 2023 at 1:23 PM Guilherme G. Piccoli wrote: > > On 31/01/2023 14:52, Alex Deucher wrote: > > [...] > >> (b) We can't use sched.ready, which would make sense...but amdgpu > >> overrides its meaning, the driver manipulates this value for its own > >> purposes of tracking ring init,

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Guilherme G. Piccoli
On 31/01/2023 14:52, Alex Deucher wrote: > [...] >> (b) We can't use sched.ready, which would make sense...but amdgpu >> overrides its meaning, the driver manipulates this value for its own >> purposes of tracking ring init, or something like that. >> >> This is the tangential topic: what should

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Alex Deucher
On Tue, Jan 31, 2023 at 9:32 AM Guilherme G. Piccoli wrote: > > On 31/01/2023 10:58, Chen, Guchun wrote: > > Hi Christian, > > > > Do you think if it makes sense that we can set 'ring->sched.ready' to be > > true in each ring init, even if before executing/setting up drm_sched_init > > in

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Alex Deucher
> > Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Pan, Xinhui > > ; dri-devel@lists.freedesktop.org; Tuikov, Luben > > ; Limonciello, Mario ; > > kernel-...@igalia.com; Deucher, Alexander ; > > Koenig, Christian > > Subject: RE: [PATCH] drm/amdgpu/

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Guilherme G. Piccoli
On 31/01/2023 10:58, Chen, Guchun wrote: > Hi Christian, > > Do you think if it makes sense that we can set 'ring->sched.ready' to be true > in each ring init, even if before executing/setting up drm_sched_init in > amdgpu_device_init_schedulers? As 'ready' is a member of gpu scheduler >

RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Chen, Guchun
her, Alexander Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini Am 31.01.23 um 10:17 schrieb Chen, Guchun: > Hi Piccoli, > > Please ignore my request of full dmesg log. I can reproduce the issue and get > the same failure callstack by returning earl

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Christian König
Subject: RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini Hi Piccoli, I agree with Alex's point, using ring->sched.name for such check is not a good way. BTW, can you please attach a full dmesg long in bad case to help me understand more? Regards, Guc

RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Chen, Guchun
Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli wrote: > > + Luben > > (sorry, missed that in the first submission). > > On 30/01/2023 18:45, Guilherme G. Piccoli wrote: > > Currently amdgpu

RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Chen, Guchun
AM To: Guilherme G. Piccoli Cc: amd-...@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun ; Pan, Xinhui ; dri-devel@lists.freedesktop.org; Tuikov, Luben ; Limonciello, Mario ; kernel-...@igalia.com; Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Alex Deucher
On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli wrote: > > + Luben > > (sorry, missed that in the first submission). > > On 30/01/2023 18:45, Guilherme G. Piccoli wrote: > > Currently amdgpu calls drm_sched_fini() from the fence driver sw fini > > routine - such function is expected to be

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Guilherme G. Piccoli
+ Luben (sorry, missed that in the first submission). On 30/01/2023 18:45, Guilherme G. Piccoli wrote: > Currently amdgpu calls drm_sched_fini() from the fence driver sw fini > routine - such function is expected to be called only after the > respective init function - drm_sched_init() - was

[PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Guilherme G. Piccoli
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function