Hi,
If you require all the partitioned to be saved with saveAsTextFile you can
use coalesce(1,true).saveAsTextFile(). This basically means do the
computation then coalesce to only 1 partition. You can also use
repartition(1) too which is just a wrapper for the coalesce that sets the
shuffle argume
Hi all,
Please tell me how can I tune output partition numbers.
I run my spark job on my local machine with 8 cores and input data is
6.5GB. It creates 193 tasks and put the output into 193 partitions.
How can I change the number of tasks and consequently, the number of output
files?
Best,
Soheil