Hi All, I have a dataframe of size 2.7T (parquet) which I need to partition by date, however below spark program doesn't help - keeps failing due to *file already exists exception..*
df = spark.read.parquet(INPUT_PATH) df.repartition('date_field').write.partitionBy('date_field').mode('overwrite').parquet(PATH) I did notice that couple of tasks failed and probably that's why it tried spinning up new ones which write to the same .staging directory? -- Regards, Rishi Shah