Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-14 Thread Christian König
Ok, that one was easy to fix. Please apply the attached patch as well. Going to send out both for inclusion in 3.12 in a minute. Christian. Am 13.10.2013 22:16, schrieb Marek Olšák: This seems to be better. It can do about 3-5 resets correctly, then the GPU resuming fails: [ 246.882780] [drm

[PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-13 Thread Marek Olšák
This seems to be better. It can do about 3-5 resets correctly, then the GPU resuming fails: [ 246.882780] [drm:cik_resume] *ERROR* cik startup failed on resume and then the GPU is being reset again and again endlessly without success. The dmesg of the endless resets is attached. Marek On Sun,

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-13 Thread Christian König
I've figured out what was wrong with the patch. We need to reset the "needs_reset" flag earlier, otherwise the IB test might think we are in a lockup and aborts the reset after waiting for the minimum timeout period. Please try the attached patch instead. Thanks, Christian. Am 09.10.2013 14:0

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-09 Thread Marek Olšák
The ring test of the first compute ring always fails and it shouldn't affect the GPU reset in any way. I can't tell if the deadlock issue is fixed, because the GPU reset usually fails with your patch. It always succeeded without your patch. Marek On Wed, Oct 9, 2013 at 1:09 PM, Christian König

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-09 Thread Christian König
Mhm, that doesn't looks like anything related but more like the reset of the compute ring didn't worked. How often does that happen? And do you still get the problem where X waits for a fence that never comes back? Christian. Am 09.10.2013 12:36, schrieb Marek Olšák: I'm afraid your patch s

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-09 Thread Marek Olšák
I'm afraid your patch sometimes causes the GPU reset to fail, which had never happened before IIRC. The dmesg log from the failure is attached. Marek On Tue, Oct 8, 2013 at 6:21 PM, Christian König wrote: > Hi Marek, > > please try the attached patch as a replacement for your signaling all fenc

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-08 Thread Christian König
Hi Marek, please try the attached patch as a replacement for your signaling all fences patch. I'm not 100% sure if it fixes all issues, but it's at least a start. Thanks, Christian. Am 07.10.2013 13:08, schrieb Christian König: First of all, I can't complain about the reliability of the har

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-07 Thread Christian König
First of all, I can't complain about the reliability of the hardware GPU reset. It's mostly the kernel driver that happens to run into a deadlock at the same time. Alex and I spend quite some time on making this reliable again after activating more rings and adding VM support. The main problem

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-02 Thread Marek Olšák
First of all, I can't complain about the reliability of the hardware GPU reset. It's mostly the kernel driver that happens to run into a deadlock at the same time. Regarding the issue with fences, the problem is that the GPU reset completes successfully according to dmesg, but X doesn't respond. I

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-02 Thread Christian König
Possible, but I would rather guess that this doesn't work because the IB test runs into a deadlock situation and so the GPU reset never fully completes. Can you reproduce the problem? If you want to make GPU resets more reliable I would rather suggest to remove the ring lock dependency. Then we

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-02 Thread Marek Olšák
I'm afraid signalling the fences with an IB test is not reliable. Marek On Wed, Oct 2, 2013 at 3:52 PM, Christian König wrote: > NAK, after recovering from a lockup the first thing we do is signalling all > remaining fences with an IB test. > > If we don't recover we indeed signal all fences ma

Re: [PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-02 Thread Christian König
NAK, after recovering from a lockup the first thing we do is signalling all remaining fences with an IB test. If we don't recover we indeed signal all fences manually. Signalling all fences regardless of the outcome of the reset creates problems with both types of partial resets. Christian. M

[PATCH] drm/radeon: signal all fences after lockup to avoid endless waiting in GEM_WAIT

2013-10-02 Thread Marek Olšák
From: Marek Olšák After a lockup, fences are not signalled sometimes, causing the GEM_WAIT_IDLE ioctl to never return, which sometimes results in an X server freeze. This fixes only one of many deadlocks which can occur during a lockup. Signed-off-by: Marek Olšák --- drivers/gpu/drm/radeon/ra