Spark doesn't add _SUCCESS file when 'partitionBy' is used

Eric Beabes Mon, 05 Apr 2021 09:49:14 -0700

When I do the following, Spark( 2.4) doesn't put _SUCCESS file in the
partition directory:


val outputPath = s"s3://mybucket/$table"

df
.orderBy(time)
.coalesce(numFiles)
.write
.partitionBy("partitionDate")
.mode("overwrite")
.format("parquet")

.save(outputPath)


But when I remove 'partitionBy' & add partition info in the outputPath
as shown below, I do see the _SUCCESS file.


*Questions:*

1) Is the following solution acceptable?

2) Would this cause problems elsewhere if I don't use the 'partitionBy' clause?

3) Is there a better way to ensure that _SUCCESS file is created in
each partition?


val outputPath = s"s3://mybucket/$table/date=<some date>"

df
.orderBy(time)
.coalesce(numFiles)
.write
.mode("overwrite")
.format("parquet")

.save(outputPath)

Spark doesn't add _SUCCESS file when 'partitionBy' is used

Reply via email to