[ https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Damian Momot updated SPARK-19293: --------------------------------- Description: After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often failing when speculative mode is enabled. In 2.0.2 speculative tasks were simply skipped if they were not used for result (i.e. other instance finished earlier) - and it was clearly visible in UI that those tasks were not counted as failures. In 2.1.0 many tasks are being marked failed/killed when speculative tasks start to run (that is at the end of stage when there are spare executors to use) which also leads to entire stage/job failures. Disabling spark.speculation solves failing problem - but speculative mode is very useful especially when different executors run on machines with varying load (for example in YARN) was: After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often failing when speculative mode is enabled. In 2.0.2 speculative tasks were simply skipped if they were not used for result (i.e. other instance finished earlier) - and it was clearly visible in UI that those tasks were not counted as failures. In 2.1.0 many tasks are being marked failed/killed when speculative tasks start to run (that is at the end of stage when there are spare executors to use) which also leads to entire stage/job failures. > Spark 2.1.0 unstable with spark.speculation=true > ------------------------------------------------ > > Key: SPARK-19293 > URL: https://issues.apache.org/jira/browse/SPARK-19293 > Project: Spark > Issue Type: Bug > Affects Versions: 2.1.0 > Reporter: Damian Momot > Priority: Critical > > After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often > failing when speculative mode is enabled. > In 2.0.2 speculative tasks were simply skipped if they were not used for > result (i.e. other instance finished earlier) - and it was clearly visible in > UI that those tasks were not counted as failures. > In 2.1.0 many tasks are being marked failed/killed when speculative tasks > start to run (that is at the end of stage when there are spare executors to > use) which also leads to entire stage/job failures. > Disabling spark.speculation solves failing problem - but speculative mode is > very useful especially when different executors run on machines with varying > load (for example in YARN) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org