Hi
I am trying to save a RDD to disk and I am using the saveAsNewAPIHadoopFile
for that. I am seeing that it takes almost 20 mins for about 900 GB of
data. Is there any parameter that I can tune to make this saving faster.
I am running about 45 executors with 5 cores each on 5 Spark worker nodes
and using Spark on YARN for this..
Thanks for your help.
C

Reply via email to