Hi I am trying to save a RDD to disk and I am using the saveAsNewAPIHadoopFile for that. I am seeing that it takes almost 20 mins for about 900 GB of data. Is there any parameter that I can tune to make this saving faster. I am running about 45 executors with 5 cores each on 5 Spark worker nodes and using Spark on YARN for this.. Thanks for your help. C
- Save a spark RDD to disk Elf Of Lothlorein
- Re: Save a spark RDD to disk Andrew Holway
- Re: Save a spark RDD to disk Michael Segel