[ 
https://issues.apache.org/jira/browse/YUNIKORN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kinga Marton reassigned YUNIKORN-574:
-------------------------------------

    Assignee: Kinga Marton

> Wait for placeholder cleanup
> ----------------------------
>
>                 Key: YUNIKORN-574
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-574
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - scheduler
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Kinga Marton
>            Priority: Critical
>
> When we cleanup the application in the {{timeoutPlaceholderProcessing()}} we 
> have two cases.
>  * First case we clean up all lingering placeholder allocations on the 
> running app
>  * Second case is the fail of the which cleans up lingering asks no response 
> needed from the shim) and all placeholders after which we fail the app.
> The cleanup of the placeholders in both these cases are instigated by the 
> core and we need to wait for the cleanup to happen on the shim side before we 
> proceed. It is not like the remove of the app signalled by the RM. This comes 
> as an unexpected request for the shim not when the app is deleted on the shim 
> side.
> For case 1 we do not have a problem. The placeholders are terminated and the 
> app runs as per normal and is not moved to Completed  until all is finished.  
> We do NOT have an issue in the states leading to Completed as we have already 
> handled it there (see below)
> For the failure case we immediately unlink the queue as we move into the 
> FAILED state. As the move calls the {{moveTerminatedApp()}} via the callback. 
> That causes an issue. We should be waiting for the shim to respond back to 
> the core with the confirmation of the removal.
> This might require a new state to do this in two steps: trigger the cleanup 
> move to Failing state, when all is cleaned up move to Failed.
> BTW: introducing a new state for Failing should also include the rename of 
> Waiting to Completing as that is inline with what the state does and lines up 
> between the two final states. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to