When I do the following, Spark( 2.4) doesn't put _SUCCESS file in the
partition directory:
val outputPath = s"s3://mybucket/$table"
df
.orderBy(time)
.coalesce(numFiles)
.write
.partitionBy("partitionDate")
.mode("overwrite")
.format("parquet")
.save(outputPath)
But when I remove 'partitionBy' & add partition info in the outputPath
as shown below, I do see the _SUCCESS file.
*Questions:*
1) Is the following solution acceptable?
2) Would this cause problems elsewhere if I don't use the 'partitionBy' clause?
3) Is there a better way to ensure that _SUCCESS file is created in
each partition?
val outputPath = s"s3://mybucket/$table/date=<some date>"
df
.orderBy(time)
.coalesce(numFiles)
.write
.mode("overwrite")
.format("parquet")
.save(outputPath)