Hey Friends, I am trying to use sqlContext.write.parquet() to write dataframe to parquet files. I have the following questions.
1. number of partitions The default number of partition seems to be 200. Is there any way other than using df.repartition(n) to change this number? I was told repartition can be very expensive. 2. partition by size When I use df.partitionBy(['year']), if the number of entries with "year=2006" is very small, the sizes of the files under partition "year=2006" can be very small. If we can assign a size to each partition file, that'll be very helpful. Thank you, Wei