Hello, I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a few configs and none of them seem to work. Output always creates _temporary directory. Rename is killing performance. I read some notes about DirectOutputcommitter causing problems with speculation turned on. Was this option removed entirely?
val spark = SparkSession.builder() .appName("MergeEntities") .config("spark.sql.warehouse.dir", mergeConfig.getString(" sparkSqlWarehouseDir")) .config("fs.s3a.buffer.dir", "/tmp") .config("spark.hadoop.mapred.output.committer.class", classOf[DirectOutputCommitter].getCanonicalName) .config("mapred.output.committer.class", classOf[DirectOutputCommitter].getCanonicalName) .config("mapreduce.use.directfileoutputcommitter", "true") //.config("spark.sql.sources.outputCommitterClass", classOf[DirectOutputCommitter].getCanonicalName) .getOrCreate() Srikanth