On (04/12/15 16:58), Lukas Slebodnik wrote:
>On (04/12/15 16:02), Michal Židek wrote:
>>On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:
>>>On (04/12/15 14:35), Michal Židek wrote:
>>>>On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
>>>>>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
>>>>>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
>>>>>>>On (04/12/15 12:11), Jakub Hrozek wrote:
>>>>>>>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
>>>>>>>>>On (03/12/15 20:22), Jakub Hrozek wrote:
>>>>>>>>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
>>>>>>>>>>>On (02/12/15 17:10), Michal Židek wrote:
>>>>>>>>>>>>Hi!
>>>>>>>>>>>>
>>>>>>>>>>>>I saw some integration tests failures recently,
>>>>>>>>>>>>and I think there is a race condition between the
>>>>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>>>>after some operations that wait for this timeout.
>>>>>>>>>>>>SSSD fails to populate changes from LDAP in time
>>>>>>>>>>>>and some asserts can fail because of this.
>>>>>>>>>>>>
>>>>>>>>>>>>So far I saw 4 tests to fail like this, which
>>>>>>>>>>>>is already quite a lot.
>>>>>>>>>>>>
>>>>>>>>>>>>The attached patch modifies the timeout values
>>>>>>>>>>>>and hopefully removes the issue.
>>>>>>>>>>>>
>>>>>>>>>>>>Michal
>>>>>>>>>>>
>>>>>>>>>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 
>>>>>>>>>>>>2001
>>>>>>>>>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= <mzi...@redhat.com>
>>>>>>>>>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100
>>>>>>>>>>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
>>>>>>>>>>>>
>>>>>>>>>>>>There is a race condation between ldap
>>>>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>>>>that wait for the ldap changes to populate
>>>>>>>>>>>>to SSSD if the timeout and the sleeps have
>>>>>>>>>>>>the same value.
>>>>>>>>>>>>---
>>>>>>>>>>>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++-------------
>>>>>>>>>>>>1 file changed, 17 insertions(+), 13 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>>diff --git a/src/tests/intg/ldap_test.py 
>>>>>>>>>>>>b/src/tests/intg/ldap_test.py
>>>>>>>>>>>>index 757ee20..8ec8dbe 100644
>>>>>>>>>>>>--- a/src/tests/intg/ldap_test.py
>>>>>>>>>>>>+++ b/src/tests/intg/ldap_test.py
>>>>>>>>>>>>@@ -33,7 +33,11 @@ import ldap_ent
>>>>>>>>>>>>from util import *
>>>>>>>>>>>>
>>>>>>>>>>>>LDAP_BASE_DN = "dc=example,dc=com"
>>>>>>>>>>>>-INTERACTIVE_TIMEOUT = 4
>>>>>>>>>>>>+INTERACTIVE_TIMEOUT = 2
>>>>>>>>>>>>+
>>>>>>>>>>>>+
>>>>>>>>>>>>+def wait_for_ldap_enum_refresh():
>>>>>>>>>>>>+    time.sleep(INTERACTIVE_TIMEOUT + 4)
>>>>>>>>>>>Why does it need to be INTERACTIVE_TIMEOUT + 4
>>>>>>>>>>>
>>>>>>>>>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Regardless of the value we choose, can we move this patch forward? I 
>>>>>>>>>>see
>>>>>>>>>>the related failure quite often in SSSD.
>>>>>>>>>Adding timeout without real explanation is not a solution.
>>>>>>>>>
>>>>>>>>>The main problem is with empiric values.
>>>>>>>>>If they are very high then test are slow.
>>>>>>>>>And there still can be slow/fast machine where lower values caused 
>>>>>>>>>troubles.
>>>>>>>>>
>>>>>>>>>The ideal solution would be to get rid of enumeration
>>>>>>>>>in ldap tests.
>>>>>>>>
>>>>>>>>Enumeration is a codepath that is different from non-enumeration, so it
>>>>>>>>should be tested. Not as priority, not as the only ldap tests, but it's
>>>>>>>>a valid case, so it should be there.
>>>>>>>>
>>>>>>>>>If we want to test enumeration than it should be in separate
>>>>>>>>>test.
>>>>>>>>
>>>>>>>>Maybe, but we do have enumeration tests and we should fix them.
>>>>>>>>
>>>>>>>Adding sleep is not a fix. It's just a workaround
>>>>>>>because all sleep timeout are just an empiric values.
>>>>>>>and we should fix test and not adding workaround/hacks.
>>>>>>>
>>>>>>>If we cannot fix the test and don't want te rewrite test without 
>>>>>>>enumeration
>>>>>>>then we should remove/revert problematic tests.
>>>>>>>
>>>>>>>LS
>>>>>>
>>>>>>I will send a new patch with an explanation (sort of),
>>>>>>but it still will be a guess. I am not sure what the
>>>>>>real safe value should be, only that the sleep's
>>>>>>after operation should be longer than the ldap
>>>>>>refresh and enum cache timeouts (and that the
>>>>>>current values do not cope well wit the CI load).
>>>>>
>>>>>Would it be more acceptable then to define the ldap refresh and enum
>>>>>cache timeouts as variables in the test and sleep for (enum_timeout +
>>>>>cache_timeout + 1) ?
>>>>>
>>>>>At least that would be more readable than a magic constant..
>>>>
>>>>Will do. All will be derived from INTERACTIVE_TIMEOUT
>>>>so that we need to change just one value if needed in the
>>>>future.
>>>>
>>>Will it be reliable?
>>>
>>>Will it work on slow(arm) machines?
>>>
>>>I plan to run integration test in "%check" phase
>>>of rpm build. And koji/fedora has rpm machines.
>>
>>We can mark tests that may fail on slow machines
>>due to timeouts as "unsafe" and skip them in the
>>rpm build.
>It's not about rpmbuild, it's a general problem.
>And marking tests as unsafe does not solve anything.
>We need a reliable way how to find out that enumeratin task is finished.
>
>Even grepping log files in loop is better than using
>sleep
>
I seems that nobody wants to rewrite test to get rid of enumeration.
adding/increasing timeouts is not reliable.
I have some bad experiences in downstream.

So the ideal solution might be to disable unstable tests.
If there will not be better solution I will send patch in the middle of
next week.

LS
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

Reply via email to