[ https://issues.apache.org/jira/browse/SPARK-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or resolved SPARK-14468. ------------------------------- Resolution: Fixed Fix Version/s: 1.5.2 2.0.0 1.6.2 1.4.2 Target Version/s: 1.5.2, 1.4.2, 1.6.2, 2.0.0 (was: 1.4.2, 1.5.2, 1.6.2, 2.0.0) > Always enable OutputCommitCoordinator > ------------------------------------- > > Key: SPARK-14468 > URL: https://issues.apache.org/jira/browse/SPARK-14468 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Andrew Or > Assignee: Andrew Or > Fix For: 1.4.2, 1.6.2, 2.0.0, 1.5.2 > > > The OutputCommitCoordinator was originally introduced in SPARK-4879 because > speculation causes the output of some partitions to be deleted. However, as > we can see in SPARK-10063, speculation is not the only case where this can > happen. > More specifically, when we retry a stage we're not guaranteed to kill the > tasks that are still running (we don't even interrupt their threads), so we > may end up with multiple concurrent task attempts for the same task. This > leads to problems like SPARK-8029, but this fix alone is necessary but not > sufficient. > In general, when we run into situations like these, we need the > OutputCommitCoordinator because we don't control what the underlying file > system does. Enabling this doesn't induce heavy performance costs so there's > little reason why we shouldn't always enable it to ensure correctness. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org