[ https://issues.apache.org/jira/browse/YUNIKORN-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko resolved YUNIKORN-2294. ------------------------------------ Fix Version/s: 1.5.0 Resolution: Fixed > Flaky E2E Test: "Verify_Hard_GS_Failed_State" polling short-lived "Failing" > application status > ---------------------------------------------------------------------------------------------- > > Key: YUNIKORN-2294 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2294 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: test - e2e > Reporter: Yu-Lin Chen > Assignee: Yu-Lin Chen > Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > We got below E2E test fails In gang_scheduling e2e test > “Verify_Hard_GS_Failed_State”. > # > [https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972 > > |https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972](PR > of YUNIKORN-2292) > # > [https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971 > > |https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971](PR > of YUNIKORN-2247) > The e2e test waits until application status turn into ‘Failing’. > ([gang_scheduling_test.go#L288|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/gang_scheduling/gang_scheduling_test.go#L288]) > However, the application won't stay in "Failing" too long. Below are my > local test results. > # 0.565 seconds > # 0.519 seconds > # 0.634 seconds > # 0.604 seconds > # 0.573 seconds > # 0.586 seconds > # 0.587 seconds > # 0.640 seconds > # 0.779 seconds > # 0.584 seconds > (PS: Compare the time between 2 failApplication events, "Accept->Failing", > "Failing -> Failed") > The polling frequency of checkAppStatus() is 300ms, so {color:#de350b}this > issue still can't be reproduced in my local environment.{color} However, we > still have no guarantee that the application will stay in 'Failing' longer > than 300 ms. > (The dumped scheduler log of the e2e test is missing due to the issue > mentioned in YUNIKORN-2293. The e2e test didn't call > tests.LogYunikornContainer() in AfterEach. After YUNIKORN-2293 fixed, we will > be able to check the failed log in Github action.) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org