[ https://issues.apache.org/jira/browse/SPARK-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kay Ousterhout resolved SPARK-11460. ------------------------------------ Resolution: Duplicate > Locality waits should be based on task set creation time, not last launch time > ------------------------------------------------------------------------------ > > Key: SPARK-11460 > URL: https://issues.apache.org/jira/browse/SPARK-11460 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.2.2, > 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0, 1.5.1 > Environment: YARN > Reporter: Shengyue Ji > Original Estimate: 2h > Remaining Estimate: 2h > > Spark waits for spark.locality.waits period before going from RACK_LOCAL to > ANY when selecting an executor for assignment. The timeout was essentially > reset each time a new assignment is made. > We were running Spark streaming on Kafka with a 10 second batch window on 32 > Kafka partitions with 16 executors. All executors were in the ANY group. At > one point one RACK_LOCAL executor was added and all tasks were assigned to > it. Each task took about 0.6 second to process, resetting the > spark.locality.wait timeout (3000ms) repeatedly. This caused the whole > process to under utilize resources and created an increasing backlog. > spark.locality.wait should be based on the task set creation time, not last > launch time so that after 3000ms of initial creation, all executors can get > tasks assigned to them. > We are specifying a zero timeout for now as a workaround to disable locality > optimization. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L556 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org