On (04/12/15 16:58), Lukas Slebodnik wrote: >On (04/12/15 16:02), Michal Židek wrote: >>On 12/04/2015 03:07 PM, Lukas Slebodnik wrote: >>>On (04/12/15 14:35), Michal Židek wrote: >>>>On 12/04/2015 02:32 PM, Jakub Hrozek wrote: >>>>>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote: >>>>>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: >>>>>>>On (04/12/15 12:11), Jakub Hrozek wrote: >>>>>>>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: >>>>>>>>>On (03/12/15 20:22), Jakub Hrozek wrote: >>>>>>>>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >>>>>>>>>>>On (02/12/15 17:10), Michal Židek wrote: >>>>>>>>>>>>Hi! >>>>>>>>>>>> >>>>>>>>>>>>I saw some integration tests failures recently, >>>>>>>>>>>>and I think there is a race condition between the >>>>>>>>>>>>enumeration refresh timeout and the sleeps >>>>>>>>>>>>after some operations that wait for this timeout. >>>>>>>>>>>>SSSD fails to populate changes from LDAP in time >>>>>>>>>>>>and some asserts can fail because of this. >>>>>>>>>>>> >>>>>>>>>>>>So far I saw 4 tests to fail like this, which >>>>>>>>>>>>is already quite a lot. >>>>>>>>>>>> >>>>>>>>>>>>The attached patch modifies the timeout values >>>>>>>>>>>>and hopefully removes the issue. >>>>>>>>>>>> >>>>>>>>>>>>Michal >>>>>>>>>>> >>>>>>>>>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 >>>>>>>>>>>>2001 >>>>>>>>>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= <mzi...@redhat.com> >>>>>>>>>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100 >>>>>>>>>>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >>>>>>>>>>>> >>>>>>>>>>>>There is a race condation between ldap >>>>>>>>>>>>enumeration refresh timeout and the sleeps >>>>>>>>>>>>that wait for the ldap changes to populate >>>>>>>>>>>>to SSSD if the timeout and the sleeps have >>>>>>>>>>>>the same value. >>>>>>>>>>>>--- >>>>>>>>>>>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >>>>>>>>>>>>1 file changed, 17 insertions(+), 13 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>>diff --git a/src/tests/intg/ldap_test.py >>>>>>>>>>>>b/src/tests/intg/ldap_test.py >>>>>>>>>>>>index 757ee20..8ec8dbe 100644 >>>>>>>>>>>>--- a/src/tests/intg/ldap_test.py >>>>>>>>>>>>+++ b/src/tests/intg/ldap_test.py >>>>>>>>>>>>@@ -33,7 +33,11 @@ import ldap_ent >>>>>>>>>>>>from util import * >>>>>>>>>>>> >>>>>>>>>>>>LDAP_BASE_DN = "dc=example,dc=com" >>>>>>>>>>>>-INTERACTIVE_TIMEOUT = 4 >>>>>>>>>>>>+INTERACTIVE_TIMEOUT = 2 >>>>>>>>>>>>+ >>>>>>>>>>>>+ >>>>>>>>>>>>+def wait_for_ldap_enum_refresh(): >>>>>>>>>>>>+ time.sleep(INTERACTIVE_TIMEOUT + 4) >>>>>>>>>>>Why does it need to be INTERACTIVE_TIMEOUT + 4 >>>>>>>>>>> >>>>>>>>>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>Regardless of the value we choose, can we move this patch forward? I >>>>>>>>>>see >>>>>>>>>>the related failure quite often in SSSD. >>>>>>>>>Adding timeout without real explanation is not a solution. >>>>>>>>> >>>>>>>>>The main problem is with empiric values. >>>>>>>>>If they are very high then test are slow. >>>>>>>>>And there still can be slow/fast machine where lower values caused >>>>>>>>>troubles. >>>>>>>>> >>>>>>>>>The ideal solution would be to get rid of enumeration >>>>>>>>>in ldap tests. >>>>>>>> >>>>>>>>Enumeration is a codepath that is different from non-enumeration, so it >>>>>>>>should be tested. Not as priority, not as the only ldap tests, but it's >>>>>>>>a valid case, so it should be there. >>>>>>>> >>>>>>>>>If we want to test enumeration than it should be in separate >>>>>>>>>test. >>>>>>>> >>>>>>>>Maybe, but we do have enumeration tests and we should fix them. >>>>>>>> >>>>>>>Adding sleep is not a fix. It's just a workaround >>>>>>>because all sleep timeout are just an empiric values. >>>>>>>and we should fix test and not adding workaround/hacks. >>>>>>> >>>>>>>If we cannot fix the test and don't want te rewrite test without >>>>>>>enumeration >>>>>>>then we should remove/revert problematic tests. >>>>>>> >>>>>>>LS >>>>>> >>>>>>I will send a new patch with an explanation (sort of), >>>>>>but it still will be a guess. I am not sure what the >>>>>>real safe value should be, only that the sleep's >>>>>>after operation should be longer than the ldap >>>>>>refresh and enum cache timeouts (and that the >>>>>>current values do not cope well wit the CI load). >>>>> >>>>>Would it be more acceptable then to define the ldap refresh and enum >>>>>cache timeouts as variables in the test and sleep for (enum_timeout + >>>>>cache_timeout + 1) ? >>>>> >>>>>At least that would be more readable than a magic constant.. >>>> >>>>Will do. All will be derived from INTERACTIVE_TIMEOUT >>>>so that we need to change just one value if needed in the >>>>future. >>>> >>>Will it be reliable? >>> >>>Will it work on slow(arm) machines? >>> >>>I plan to run integration test in "%check" phase >>>of rpm build. And koji/fedora has rpm machines. >> >>We can mark tests that may fail on slow machines >>due to timeouts as "unsafe" and skip them in the >>rpm build. >It's not about rpmbuild, it's a general problem. >And marking tests as unsafe does not solve anything. >We need a reliable way how to find out that enumeratin task is finished. > >Even grepping log files in loop is better than using >sleep > I seems that nobody wants to rewrite test to get rid of enumeration. adding/increasing timeouts is not reliable. I have some bad experiences in downstream.
So the ideal solution might be to disable unstable tests. If there will not be better solution I will send patch in the middle of next week. LS _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org