I will try this in Monday. Thanks for the tip.
On Fri, 15 Jan 2016, 18:58 Cheng Lian wrote:
> You may try DataFrame.repartition(partitionExprs: Column*) to shuffle all
> data belonging to a single (data) partition into a single (RDD) partition:
>
> df.coalesce(1).repartition("entity", "year", "m
You may try DataFrame.repartition(partitionExprs: Column*) to shuffle
all data belonging to a single (data) partition into a single (RDD)
partition:
|df.coalesce(1)|||.repartition("entity", "year", "month", "day",
"status")|.write.partitionBy("entity", "year", "month", "day",
"status").mode(S
Why do you need to be only one file? Spark doing good job writing in
many files.
On Fri, Jan 15, 2016 at 7:48 AM, Patrick McGloin
wrote:
> Hi,
>
> I would like to reparation / coalesce my data so that it is saved into one
> Parquet file per partition. I would also like to use the Spark SQL
> part
Hi,
I would like to reparation / coalesce my data so that it is saved into one
Parquet file per partition. I would also like to use the Spark SQL
partitionBy API. So I could do that like this:
df.coalesce(1).write.partitionBy("entity", "year", "month", "day",
"status").mode(SaveMode.Append).parqu