Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

Takeshi Yamamuro Thu, 29 Sep 2016 17:12:41 -0700

Hi,

FYI: Seems 
`sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version","2”)`
is only available at hadoop-2.7.3+.


// maropu


On Thu, Sep 29, 2016 at 9:28 PM, joffe.tal <joffe....@gmail.com> wrote:

> You can use partition explicitly by adding "/<col_name>=<partition value>"
> to
> the end of the path you are writing to and then use overwrite.
>
> BTW in Spark 2.0 you just need to use:
>
> sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.
> algorithm.version","2”)
> and use s3a://
>
> and you can work with regular output committer (actually
> DirectParquetOutputCommitter is no longer available in Spark 2.0)
>
> so if you are planning on upgrading this could be another motivation
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/S3-DirectParquetOutputCommitter-
> PartitionBy-SaveMode-Append-tp26398p27810.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

Reply via email to