Chaoran Yu created YUNIKORN-657: ----------------------------------- Summary: Expose reason of application failure to pods Key: YUNIKORN-657 URL: https://issues.apache.org/jira/browse/YUNIKORN-657 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Chaoran Yu Assignee: Chaoran Yu
An application may fail for a number of reasons. For example, * In gang scheduling, placeholders have expired before all of them can be successfully allocated * When no placement rules are defined (i.e. static queues are used), an application is submitted to an non-existent queue * The total amount of resources requested by a gang-scheduled app exceeds the capacity of the queue YK's the finite state machine has Failed as a terminal state of an app, meaning that YK won't try to bring back a failed app ever again. The consequence is that pods of such failed apps will be stuck in pending indefinitely. A better behavior is for YK to mark those pods as failed too, while also passing the reason of the failure to those pods. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org