[ 
https://issues.apache.org/jira/browse/YUNIKORN-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2294.
------------------------------------
    Fix Version/s: 1.5.0
       Resolution: Fixed

> Flaky E2E Test: "Verify_Hard_GS_Failed_State" polling short-lived "Failing" 
> application status
> ----------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2294
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2294
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: test - e2e
>            Reporter: Yu-Lin Chen
>            Assignee: Yu-Lin Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.5.0
>
>
> We got below E2E test fails In gang_scheduling e2e test 
> “Verify_Hard_GS_Failed_State”.
>  # 
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972
>  
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972](PR
>  of YUNIKORN-2292)
>  # 
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971
>  
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971](PR
>  of YUNIKORN-2247)
> The e2e test waits until application status turn into ‘Failing’. 
> ([gang_scheduling_test.go#L288|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/gang_scheduling/gang_scheduling_test.go#L288])
>  However, the application won't stay in "Failing" too long.  Below are my 
> local test results.
>  # 0.565 seconds
>  # 0.519 seconds
>  # 0.634 seconds
>  # 0.604 seconds
>  # 0.573 seconds
>  # 0.586 seconds
>  # 0.587 seconds
>  # 0.640 seconds
>  # 0.779 seconds
>  # 0.584 seconds
> (PS: Compare the time between 2 failApplication events, "Accept->Failing", 
> "Failing -> Failed")
> The polling frequency of checkAppStatus() is 300ms, so {color:#de350b}this 
> issue still can't be reproduced in my local environment.{color} However, we 
> still have no guarantee that the application will stay in 'Failing' longer 
> than 300 ms.
> (The dumped scheduler log of the e2e test is missing due to the issue 
> mentioned in YUNIKORN-2293. The e2e test didn't call 
> tests.LogYunikornContainer() in AfterEach. After YUNIKORN-2293 fixed, we will 
> be able to check the failed log in Github action.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to