Hi, Writing a csv to HDFS takes about 1 hour:
df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv) The generated csv file is only about 150kb. The job uses 3 containers (13 cores, 23g mem). Other people have similar issues but I don't see a good explanation and solution. Any clue is highly appreciated! Thanks.