Hi

I am trying to understand the best parameter settings for processing a 12.5 GB 
file with my Spark Cluster. I am using a 3 node cluster, with 8 cores and 30 
Gib of RAM on each node.

I used Cloudera's top 5 mistakes articles and tried the following 
configurations:
spark.executor.instances         6
spark.executor.cores             5
spark.executor.memory            15g
Yarn.scheduler maximum allocation MB = 10000

Can someone please confirm whether these are correct?

As an aside, I would like to better understand how YARN works in Spark - Could 
you help me with the difference between Container and Executor in YARN?

Thanks!
Pooja

______________________________________________________________________________
The Boston Consulting Group, Inc.
 
This e-mail message may contain confidential and/or privileged information.
If you are not an addressee or otherwise authorized to receive this message,
you should not use, copy, disclose or take any action based on this e-mail or
any information contained in the message. If you have received this material
in error, please advise the sender immediately by reply e-mail and delete this
message. Thank you.

Reply via email to