Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-26 Thread Marek Olšák
Perhaps I should clarify this. There are GL and Vulkan features that if any app uses them and its shaders are killed, the next IB will hang. One of them is Draw Indirect - if a shader is killed before storing the vertex count and instance count in memory, the next draw will hang with a high

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-26 Thread Michel Dänzer
On 4/25/23 21:11, Marek Olšák wrote: > The last 3 comments in this thread contain arguments that are false and were > specifically pointed out as false 6 comments ago: Soft resets are just as > fatal as hard resets. There is nothing better about soft resets. If the VRAM > is lost completely,

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-25 Thread Marek Olšák
The last 3 comments in this thread contain arguments that are false and were specifically pointed out as false 6 comments ago: Soft resets are just as fatal as hard resets. There is nothing better about soft resets. If the VRAM is lost completely, that's a different story, and if the hard reset is

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-25 Thread Christian König
Am 25.04.23 um 14:14 schrieb Michel Dänzer: On 4/25/23 14:08, Christian König wrote: Well signaling that something happened is not the question. We do this for both soft as well as hard resets. The question is if errors result in blocking further submissions with the same context or not. In

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-25 Thread Michel Dänzer
On 4/25/23 14:08, Christian König wrote: > Well signaling that something happened is not the question. We do this for > both soft as well as hard resets. > > The question is if errors result in blocking further submissions with the > same context or not. > > In case of a hard reset and

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-25 Thread Christian König
Well signaling that something happened is not the question. We do this for both soft as well as hard resets. The question is if errors result in blocking further submissions with the same context or not. In case of a hard reset and potential loss of state we have to kill the context,

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-25 Thread Marek Olšák
That supposedly depends on the compositor. There may be compositors for very specific cases (e.g. Steam Deck) that handle resets very well, and those would like to be properly notified of all resets because that's how they get the best outcome, e.g. no corruption. A soft reset that is unhandled by

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-25 Thread Michel Dänzer
On 4/24/23 18:45, Marek Olšák wrote: > Soft resets are fatal just as hard resets, but no reset is "always fatal". > There are cases when apps keep working depending on which features are being > used. It's still unsafe. Agreed, in theory. In practice, from a user PoV, right now there's pretty

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-24 Thread Marek Olšák
Soft resets are fatal just as hard resets, but no reset is "always fatal". There are cases when apps keep working depending on which features are being used. It's still unsafe. Marek On Mon, Apr 24, 2023, 03:03 Christian König wrote: > Am 24.04.23 um 03:43 schrieb André Almeida: > > When a DRM

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-24 Thread Michel Dänzer
On 4/24/23 15:26, André Almeida wrote: >> >> Additional to that I currently didn't considered soft-recovered submissions >> as fatal and continue accepting submissions from that context, but already >> wanted to talk with Marek about that behavior. >> > > Interesting. I will try to test and

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-24 Thread André Almeida
Hi Christian, thank you for your comments. Em 24/04/2023 04:03, Christian König escreveu: Am 24.04.23 um 03:43 schrieb André Almeida: When a DRM job timeout, the GPU is probably hang and amdgpu have some ways to deal with that, ranging from soft recoveries to full device reset. Anyway, when

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-24 Thread Christian König
Am 24.04.23 um 03:43 schrieb André Almeida: When a DRM job timeout, the GPU is probably hang and amdgpu have some ways to deal with that, ranging from soft recoveries to full device reset. Anyway, when userspace ask the kernel the state of the context (via AMDGPU_CTX_OP_QUERY_STATE), the kernel

Re: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-23 Thread kernel test robot
-guilty-for-any-reset-type/20230424-094534 base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/r/20230424014324.218531-1-andrealmeid%40igalia.com patch subject: [PATCH] drm/amdgpu: Mark contexts guilty for any reset type config: s390-allyesconfig

[PATCH] drm/amdgpu: Mark contexts guilty for any reset type

2023-04-23 Thread André Almeida
When a DRM job timeout, the GPU is probably hang and amdgpu have some ways to deal with that, ranging from soft recoveries to full device reset. Anyway, when userspace ask the kernel the state of the context (via AMDGPU_CTX_OP_QUERY_STATE), the kernel reports that the device was reset, regardless