[ https://issues.apache.org/jira/browse/SPARK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127322#comment-14127322 ]
Andrew Or commented on SPARK-2425: ---------------------------------- Reopened a few times to change the fixed version. There is no net change, so please disregard. > Standalone Master is too aggressive in removing Applications > ------------------------------------------------------------ > > Key: SPARK-2425 > URL: https://issues.apache.org/jira/browse/SPARK-2425 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Mark Hamstra > Assignee: Mark Hamstra > Priority: Critical > Fix For: 1.1.1, 1.2.0 > > > When standalone Executors trying to run a particular Application fail a > cummulative ApplicationState.MAX_NUM_RETRY times, Master will remove the > Application. This will be true even if there actually are a number of > Executors that are successfully running the Application. This makes > long-running standalone-mode Applications in particular unnecessarily > vulnerable to limited failures in the cluster -- e.g., a single bad node on > which Executors repeatedly fail for any reason can prevent an Application > from starting or can result in a running Application being removed even > though it could continue to run successfully (just not making use of all > potential Workers and Executors.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org