Re: How to tune number of tesks

2017-01-26 Thread Md. Rezaul Karim
Hi, If you require all the partitioned to be saved with saveAsTextFile you can use coalesce(1,true).saveAsTextFile(). This basically means do the computation then coalesce to only 1 partition. You can also use repartition(1) too which is just a wrapper for the coalesce that sets the shuffle argume

How to tune number of tesks

2017-01-26 Thread Soheila S.
Hi all, Please tell me how can I tune output partition numbers. I run my spark job on my local machine with 8 cores and input data is 6.5GB. It creates 193 tasks and put the output into 193 partitions. How can I change the number of tasks and consequently, the number of output files? Best, Soheil