>If you are doing a join/groupBy kind of operations then you need to make sure >the keys are evenly distributed throughout the partitions.
Yes I am doing join/groupBy operations.Can you point me to docs on how to do this? Spark 1.5.2 First attempt Aggregated Metrics by Executor Executor ID Address Task Time ▾ Total Tasks Failed Tasks Succeeded Tasks Shuffle Read Size / Records Shuffle Write Size / Records Shuffle Spill (Memory) Shuffle Spill (Disk) 32 rc-spark-poc-w-3.c.dailymotion-data.internal:51748 1.2 h 18 0 18 4.4 MB / 167812 51.5 GB / 128713 153.1 GB 51.1 GB Second Attempt Aggregated Metrics by Executor Executor ID Address Task Time ▾ Total Tasks Failed Tasks Succeeded Tasks Shuffle Read Size / Records 5 rc-spark-poc-w-1.c.dailymotion-data.internal:41061 47 min 8 0 8 3.9 MB / 95334 Best Regards, Ram From: Akhil Das <ak...@sigmoidanalytics.com<mailto:ak...@sigmoidanalytics.com>> Date: Saturday, December 5, 2015 at 1:32 AM To: Ram VISWANADHA <ram.viswana...@dailymotion.com<mailto:ram.viswana...@dailymotion.com>> Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Improve saveAsTextFile performance Which version of spark are you using? Can you look at the event timeline and the DAG of the job and see where its spending more time? .save simply triggers your entire pipeline, If you are doing a join/groupBy kind of operations then you need to make sure the keys are evenly distributed throughout the partitions. Thanks Best Regards On Sat, Dec 5, 2015 at 8:24 AM, Ram VISWANADHA <ram.viswana...@dailymotion.com<mailto:ram.viswana...@dailymotion.com>> wrote: That didn’t work :( Any help I have documented some steps here. http://stackoverflow.com/questions/34048340/spark-saveastextfile-last-stage-almost-never-finishes Best Regards, Ram From: Sahil Sareen <sareen...@gmail.com<mailto:sareen...@gmail.com>> Date: Wednesday, December 2, 2015 at 10:18 PM To: Ram VISWANADHA <ram.viswana...@dailymotion.com<mailto:ram.viswana...@dailymotion.com>> Cc: Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Improve saveAsTextFile performance http://stackoverflow.com/questions/29213404/how-to-split-an-rdd-into-multiple-smaller-rdds-given-a-max-number-of-rows-per