Steve Loughran created SPARK-41551: -------------------------------------- Summary: Improve/complete PathOutputCommitProtocol support for dynamic partitioning Key: SPARK-41551 URL: https://issues.apache.org/jira/browse/SPARK-41551 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.1 Reporter: Steve Loughran
Followup to SPARK-40034 as * that is incomplete as it doesn't record the partitions * as long at the job doesn't call `newTaskTempFileAbsPath()`, and slow renames are ok, both s3a committers are actually OK to use. It's only that newTaskTempFileAbsPath operation which is unsupported in s3a committers; the post-job dir rename is O(data) but file by file rename is correct for a non-atomic job commit. # Cut PathOutputCommitProtocol.newTaskTempFile; to update super partitionPaths (needs a setter). The superclass can't just say if (committer instance of PathOutputCommitter as spark-core needs to compile with older hadoop versions) # downgrade failure in setup to log (info?) # retain failure in the newTaskTempFileAbsPath call. Testing: yes -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org