Hi, Have you seen the previous thread? https://www.mail-archive.com/user@spark.apache.org/msg56791.html
// maropu On Sat, Sep 17, 2016 at 11:34 AM, Qiang Li <q...@appannie.com> wrote: > Hi, > > > I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very > quickly, but the last step, spark spend lots of time to rename or move data > from s3 temporary directory to real directory, then I try to set > > spark.hadoop.spark.sql.parquet.output.committer. > class=org.apache.spark.sql.execution.datasources.parquet. > DirectParquetOutputCommitter > or > spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet. > DirectParquetOutputCommitter > > But both doesn't work, looks like spark 2.0 removed these configs, how can > I let spark output directly without temporary directory ? > > > > *This email may contain or reference confidential information and is > intended only for the individual to whom it is addressed. Please refrain > from distributing, disclosing or copying this email and the information > contained within unless you are the intended recipient. If you received > this email in error, please notify us at le...@appannie.com > <le...@appannie.com>** immediately and remove it from your system.* -- --- Takeshi Yamamuro