Hi I am trying to understand the best parameter settings for processing a 12.5 GB file with my Spark Cluster. I am using a 3 node cluster, with 8 cores and 30 Gib of RAM on each node.
I used Cloudera's top 5 mistakes articles and tried the following configurations: spark.executor.instances 6 spark.executor.cores 5 spark.executor.memory 15g Yarn.scheduler maximum allocation MB = 10000 Can someone please confirm whether these are correct? As an aside, I would like to better understand how YARN works in Spark - Could you help me with the difference between Container and Executor in YARN? Thanks! Pooja ______________________________________________________________________________ The Boston Consulting Group, Inc. This e-mail message may contain confidential and/or privileged information. If you are not an addressee or otherwise authorized to receive this message, you should not use, copy, disclose or take any action based on this e-mail or any information contained in the message. If you have received this material in error, please advise the sender immediately by reply e-mail and delete this message. Thank you.