Thanks Andrew, So if there is only one SparkContext there is only one executor per machine? This seems to contradict Aaron's message from the link above:
"If each machine has 16 GB of RAM and 4 cores, for example, you might set spark.executor.memory between 2 and 3 GB, totaling 8-12 GB used by Spark.)" Am I reading this incorrectly? Anyway our configuration is 21 machines (one master and 20 slaves) each with 60Gb. We would like to use 4 cores per machine. This is pyspark so we want to leave say 16Gb on each machine for python processes. Thanks again for the advice! -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Wed, Jul 23, 2014 at 4:19 PM, Andrew Ash <and...@andrewash.com> wrote: > Hi Martin, > > In standalone mode, each SparkContext you initialize gets its own set of > executors across the cluster. So for example if you have two shells open, > they'll each get two JVMs on each worker machine in the cluster. > > As far as the other docs, you can configure the total number of cores > requested for the SparkContext, the amount of memory for the executor JVM > on each machine, the amount of memory for the Master/Worker daemons (little > needed since work is done in executors), and several other settings. > > Which of those are you interested in? What spec hardware do you have and > how do you want to configure it? > > Andrew > > > On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson <mar...@skimlinks.com> > wrote: > >> We are having difficulties configuring Spark, partly because we still >> don't understand some key concepts. For instance, how many executors are >> there per machine in standalone mode? This is after having closely read >> the documentation several times: >> >> *http://spark.apache.org/docs/latest/configuration.html >> <http://spark.apache.org/docs/latest/configuration.html>* >> *http://spark.apache.org/docs/latest/spark-standalone.html >> <http://spark.apache.org/docs/latest/spark-standalone.html>* >> *http://spark.apache.org/docs/latest/tuning.html >> <http://spark.apache.org/docs/latest/tuning.html>* >> *http://spark.apache.org/docs/latest/cluster-overview.html >> <http://spark.apache.org/docs/latest/cluster-overview.html>* >> >> The cluster overview has some information here about executors but is >> ambiguous about whether there are single executors or multiple executors on >> each machine. >> >> This message from Aaron Davidson implies that the executor memory >> should be set to total available memory on the machine divided by the >> number of cores: >> *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E >> <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E>* >> >> But other messages imply that the executor memory should be set to the >> *total* available memory of each machine. >> >> We would very much appreciate some clarity on this and the myriad of >> other memory settings available (daemon memory, worker memory etc). Perhaps >> a worked example could be added to the docs? I would be happy to provide >> some text as soon as someone can enlighten me on the technicalities! >> >> Thank you >> >> -- >> Martin Goodson | VP Data Science >> (0)20 3397 1240 >> [image: Inline image 1] >> > >