Great - thanks for the clarification Aaron. The offer stands for me to write some documentation and an example that covers this without leaving *any* room for ambiguity.
-- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jul 24, 2014 at 6:09 PM, Aaron Davidson <ilike...@gmail.com> wrote: > Whoops, I was mistaken in my original post last year. By default, there is > one executor per node per Spark Context, as you said. > "spark.executor.memory" is the amount of memory that the application > requests for each of its executors. SPARK_WORKER_MEMORY is the amount of > memory a Spark Worker is willing to allocate in executors. > > So if you were to set SPARK_WORKER_MEMORY to 8g everywhere on your > cluster, and spark.executor.memory to 4g, you would be able to run 2 > simultaneous Spark Contexts who get 4g per node. Similarly, if > spark.executor.memory were 8g, you could only run 1 Spark Context at a time > on the cluster, but it would get all the cluster's memory. > > > On Thu, Jul 24, 2014 at 7:25 AM, Martin Goodson <mar...@skimlinks.com> > wrote: > >> Thank you Nishkam, >> I have read your code. So, for the sake of my understanding, it seems >> that for each spark context there is one executor per node? Can anyone >> confirm this? >> >> >> -- >> Martin Goodson | VP Data Science >> (0)20 3397 1240 >> [image: Inline image 1] >> >> >> On Thu, Jul 24, 2014 at 6:12 AM, Nishkam Ravi <nr...@cloudera.com> wrote: >> >>> See if this helps: >>> >>> https://github.com/nishkamravi2/SparkAutoConfig/ >>> >>> It's a very simple tool for auto-configuring default parameters in >>> Spark. Takes as input high-level parameters (like number of nodes, cores >>> per node, memory per node, etc) and spits out default configuration, user >>> advice and command line. Compile (javac SparkConfigure.java) and run (java >>> SparkConfigure). >>> >>> Also cc'ing dev in case others are interested in helping evolve this >>> over time (by refining the heuristics and adding more parameters). >>> >>> >>> On Wed, Jul 23, 2014 at 8:31 AM, Martin Goodson <mar...@skimlinks.com> >>> wrote: >>> >>>> Thanks Andrew, >>>> >>>> So if there is only one SparkContext there is only one executor per >>>> machine? This seems to contradict Aaron's message from the link above: >>>> >>>> "If each machine has 16 GB of RAM and 4 cores, for example, you might >>>> set spark.executor.memory between 2 and 3 GB, totaling 8-12 GB used by >>>> Spark.)" >>>> >>>> Am I reading this incorrectly? >>>> >>>> Anyway our configuration is 21 machines (one master and 20 slaves) each >>>> with 60Gb. We would like to use 4 cores per machine. This is pyspark so we >>>> want to leave say 16Gb on each machine for python processes. >>>> >>>> Thanks again for the advice! >>>> >>>> >>>> >>>> -- >>>> Martin Goodson | VP Data Science >>>> (0)20 3397 1240 >>>> [image: Inline image 1] >>>> >>>> >>>> On Wed, Jul 23, 2014 at 4:19 PM, Andrew Ash <and...@andrewash.com> >>>> wrote: >>>> >>>>> Hi Martin, >>>>> >>>>> In standalone mode, each SparkContext you initialize gets its own set >>>>> of executors across the cluster. So for example if you have two shells >>>>> open, they'll each get two JVMs on each worker machine in the cluster. >>>>> >>>>> As far as the other docs, you can configure the total number of cores >>>>> requested for the SparkContext, the amount of memory for the executor JVM >>>>> on each machine, the amount of memory for the Master/Worker daemons >>>>> (little >>>>> needed since work is done in executors), and several other settings. >>>>> >>>>> Which of those are you interested in? What spec hardware do you have >>>>> and how do you want to configure it? >>>>> >>>>> Andrew >>>>> >>>>> >>>>> On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson <mar...@skimlinks.com> >>>>> wrote: >>>>> >>>>>> We are having difficulties configuring Spark, partly because we still >>>>>> don't understand some key concepts. For instance, how many executors are >>>>>> there per machine in standalone mode? This is after having closely >>>>>> read the documentation several times: >>>>>> >>>>>> *http://spark.apache.org/docs/latest/configuration.html >>>>>> <http://spark.apache.org/docs/latest/configuration.html>* >>>>>> *http://spark.apache.org/docs/latest/spark-standalone.html >>>>>> <http://spark.apache.org/docs/latest/spark-standalone.html>* >>>>>> *http://spark.apache.org/docs/latest/tuning.html >>>>>> <http://spark.apache.org/docs/latest/tuning.html>* >>>>>> *http://spark.apache.org/docs/latest/cluster-overview.html >>>>>> <http://spark.apache.org/docs/latest/cluster-overview.html>* >>>>>> >>>>>> The cluster overview has some information here about executors but is >>>>>> ambiguous about whether there are single executors or multiple executors >>>>>> on >>>>>> each machine. >>>>>> >>>>>> This message from Aaron Davidson implies that the executor memory >>>>>> should be set to total available memory on the machine divided by the >>>>>> number of cores: >>>>>> *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E >>>>>> <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E>* >>>>>> >>>>>> But other messages imply that the executor memory should be set to >>>>>> the *total* available memory of each machine. >>>>>> >>>>>> We would very much appreciate some clarity on this and the myriad of >>>>>> other memory settings available (daemon memory, worker memory etc). >>>>>> Perhaps >>>>>> a worked example could be added to the docs? I would be happy to provide >>>>>> some text as soon as someone can enlighten me on the technicalities! >>>>>> >>>>>> Thank you >>>>>> >>>>>> -- >>>>>> Martin Goodson | VP Data Science >>>>>> (0)20 3397 1240 >>>>>> [image: Inline image 1] >>>>>> >>>>> >>>>> >>>> >>> >> >