[ https://issues.apache.org/jira/browse/SPARK-16554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imran Rashid resolved SPARK-16554. ---------------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16650 [https://github.com/apache/spark/pull/16650] > Spark should kill executors when they are blacklisted > ----------------------------------------------------- > > Key: SPARK-16554 > URL: https://issues.apache.org/jira/browse/SPARK-16554 > Project: Spark > Issue Type: New Feature > Components: Scheduler > Reporter: Imran Rashid > Assignee: Jose Soltren > Fix For: 2.2.0 > > > SPARK-8425 will allow blacklisting faulty executors and nodes. However, > these blacklisted executors will continue to run. This is bad for a few > reasons: > (1) Even if there is faulty-hardware, if the cluster is under-utilized spark > may be able to request another executor on a different node. > (2) If there is a faulty-disk (the most common case of faulty-hardware), the > cluster manager may be able to allocate another executor on the same node, if > it can exclude the bad disk. (Yarn will do this with its disk-health > checker.) > With dynamic allocation, this may seem less critical, as a blacklisted > executor will stop running new tasks and eventually get reclaimed. However, > if there is cached data on those executors, they will not get killed till > {{spark.dynamicAllocation.cachedExecutorIdleTimeout}} expires, which is > (effectively) infinite by default. > Users may not *always* want to kill bad executors, so this must be > configurable to some extent. At a minimum, it should be possible to enable / > disable it; perhaps the executor should be killed after it has been > blacklisted a configurable {{N}} times. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org