[ https://issues.apache.org/jira/browse/SPARK-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imran Rashid resolved SPARK-9193. --------------------------------- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7528 [https://github.com/apache/spark/pull/7528] > Avoid assigning tasks to executors under killing > ------------------------------------------------ > > Key: SPARK-9193 > URL: https://issues.apache.org/jira/browse/SPARK-9193 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.4.0, 1.4.1 > Reporter: Jie Huang > Assignee: Jie Huang > Fix For: 1.5.0 > > > Now, when some executors are killed by dynamic-allocation, it leads to some > mis-assignment onto lost executors sometimes. Such kind of mis-assignment > causes task failure(s) or even job failure if it repeats that errors for 4 > times. > The root cause is that killExecutors doesn't remove those executors under > killing ASAP. It depends on the OnDisassociated event to refresh the active > working list later. The delay time really depends on your cluster status > (from several milliseconds to sub-minute). When new tasks to be scheduled > during that period of time, it will be assigned to those "active" but "under > killing" executors. Then the tasks will be failed due to "executor lost". The > better way is to exclude those executors under killing in the makeOffers(). > Then all those tasks won't be allocated onto those executors "to be lost" any > more. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org