On (04/12/15 14:35), Michal Židek wrote:
>On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
>>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
>>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
>>>>On (04/12/15 12:11), Jakub Hrozek wrote:
>>>>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
>>>>>>On (03/12/15 20:22), Jakub Hrozek wrote:
>>>>>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
>>>>>>>>On (02/12/15 17:10), Michal Židek wrote:
>>>>>>>>>Hi!
>>>>>>>>>
>>>>>>>>>I saw some integration tests failures recently,
>>>>>>>>>and I think there is a race condition between the
>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>after some operations that wait for this timeout.
>>>>>>>>>SSSD fails to populate changes from LDAP in time
>>>>>>>>>and some asserts can fail because of this.
>>>>>>>>>
>>>>>>>>>So far I saw 4 tests to fail like this, which
>>>>>>>>>is already quite a lot.
>>>>>>>>>
>>>>>>>>>The attached patch modifies the timeout values
>>>>>>>>>and hopefully removes the issue.
>>>>>>>>>
>>>>>>>>>Michal
>>>>>>>>
>>>>>>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001
>>>>>>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= <mzi...@redhat.com>
>>>>>>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100
>>>>>>>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
>>>>>>>>>
>>>>>>>>>There is a race condation between ldap
>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>that wait for the ldap changes to populate
>>>>>>>>>to SSSD if the timeout and the sleeps have
>>>>>>>>>the same value.
>>>>>>>>>---
>>>>>>>>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++-------------
>>>>>>>>>1 file changed, 17 insertions(+), 13 deletions(-)
>>>>>>>>>
>>>>>>>>>diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
>>>>>>>>>index 757ee20..8ec8dbe 100644
>>>>>>>>>--- a/src/tests/intg/ldap_test.py
>>>>>>>>>+++ b/src/tests/intg/ldap_test.py
>>>>>>>>>@@ -33,7 +33,11 @@ import ldap_ent
>>>>>>>>>from util import *
>>>>>>>>>
>>>>>>>>>LDAP_BASE_DN = "dc=example,dc=com"
>>>>>>>>>-INTERACTIVE_TIMEOUT = 4
>>>>>>>>>+INTERACTIVE_TIMEOUT = 2
>>>>>>>>>+
>>>>>>>>>+
>>>>>>>>>+def wait_for_ldap_enum_refresh():
>>>>>>>>>+    time.sleep(INTERACTIVE_TIMEOUT + 4)
>>>>>>>>Why does it need to be INTERACTIVE_TIMEOUT + 4
>>>>>>>>
>>>>>>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5
>>>>>>>>
>>>>>>>
>>>>>>>Regardless of the value we choose, can we move this patch forward? I see
>>>>>>>the related failure quite often in SSSD.
>>>>>>Adding timeout without real explanation is not a solution.
>>>>>>
>>>>>>The main problem is with empiric values.
>>>>>>If they are very high then test are slow.
>>>>>>And there still can be slow/fast machine where lower values caused 
>>>>>>troubles.
>>>>>>
>>>>>>The ideal solution would be to get rid of enumeration
>>>>>>in ldap tests.
>>>>>
>>>>>Enumeration is a codepath that is different from non-enumeration, so it
>>>>>should be tested. Not as priority, not as the only ldap tests, but it's
>>>>>a valid case, so it should be there.
>>>>>
>>>>>>If we want to test enumeration than it should be in separate
>>>>>>test.
>>>>>
>>>>>Maybe, but we do have enumeration tests and we should fix them.
>>>>>
>>>>Adding sleep is not a fix. It's just a workaround
>>>>because all sleep timeout are just an empiric values.
>>>>and we should fix test and not adding workaround/hacks.
>>>>
>>>>If we cannot fix the test and don't want te rewrite test without enumeration
>>>>then we should remove/revert problematic tests.
>>>>
>>>>LS
>>>
>>>I will send a new patch with an explanation (sort of),
>>>but it still will be a guess. I am not sure what the
>>>real safe value should be, only that the sleep's
>>>after operation should be longer than the ldap
>>>refresh and enum cache timeouts (and that the
>>>current values do not cope well wit the CI load).
>>
>>Would it be more acceptable then to define the ldap refresh and enum
>>cache timeouts as variables in the test and sleep for (enum_timeout +
>>cache_timeout + 1) ?
>>
>>At least that would be more readable than a magic constant..
>
>Will do. All will be derived from INTERACTIVE_TIMEOUT
>so that we need to change just one value if needed in the
>future.
>
Will it be reliable?

Will it work on slow(arm) machines?

I plan to run integration test in "%check" phase
of rpm build. And koji/fedora has rpm machines.

I already have a POC patch.

LS
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

Reply via email to