I would like to know why it is faster to write out an RDD that has 30,000
partitions as 30,000 files sized 1K-2M rather than coalescing it to 1000
partitions and writing out 1000 S3 files of roughly 26MB each, or even 100
partitions and 100 S3 files of 260MB each.

The coalescing takes a long time.


Thanks,

Adnan

Reply via email to