Re: Configuring Spark Memory

Martin Goodson Thu, 24 Jul 2014 10:15:27 -0700

Great - thanks for the clarification Aaron. The offer stands for me to
write some documentation and an example that covers this without leaving
*any* room for ambiguity.





-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]


On Thu, Jul 24, 2014 at 6:09 PM, Aaron Davidson <ilike...@gmail.com> wrote:

> Whoops, I was mistaken in my original post last year. By default, there is
> one executor per node per Spark Context, as you said.
> "spark.executor.memory" is the amount of memory that the application
> requests for each of its executors. SPARK_WORKER_MEMORY is the amount of
> memory a Spark Worker is willing to allocate in executors.
>
> So if you were to set SPARK_WORKER_MEMORY to 8g everywhere on your
> cluster, and spark.executor.memory to 4g, you would be able to run 2
> simultaneous Spark Contexts who get 4g per node. Similarly, if
> spark.executor.memory were 8g, you could only run 1 Spark Context at a time
> on the cluster, but it would get all the cluster's memory.
>
>
> On Thu, Jul 24, 2014 at 7:25 AM, Martin Goodson <mar...@skimlinks.com>
> wrote:
>
>> Thank you Nishkam,
>> I have read your code. So, for the sake of my understanding, it seems
>> that for each spark context there is one executor per node? Can anyone
>> confirm this?
>>
>>
>> --
>> Martin Goodson  |  VP Data Science
>> (0)20 3397 1240
>> [image: Inline image 1]
>>
>>
>> On Thu, Jul 24, 2014 at 6:12 AM, Nishkam Ravi <nr...@cloudera.com> wrote:
>>
>>> See if this helps:
>>>
>>> https://github.com/nishkamravi2/SparkAutoConfig/
>>>
>>> It's a very simple tool for auto-configuring default parameters in
>>> Spark. Takes as input high-level parameters (like number of nodes, cores
>>> per node, memory per node, etc) and spits out default configuration, user
>>> advice and command line. Compile (javac SparkConfigure.java) and run (java
>>> SparkConfigure).
>>>
>>> Also cc'ing dev in case others are interested in helping evolve this
>>> over time (by refining the heuristics and adding more parameters).
>>>
>>>
>>>  On Wed, Jul 23, 2014 at 8:31 AM, Martin Goodson <mar...@skimlinks.com>
>>> wrote:
>>>
>>>> Thanks Andrew,
>>>>
>>>> So if there is only one SparkContext there is only one executor per
>>>> machine? This seems to contradict Aaron's message from the link above:
>>>>
>>>> "If each machine has 16 GB of RAM and 4 cores, for example, you might
>>>> set spark.executor.memory between 2 and 3 GB, totaling 8-12 GB used by
>>>> Spark.)"
>>>>
>>>> Am I reading this incorrectly?
>>>>
>>>> Anyway our configuration is 21 machines (one master and 20 slaves) each
>>>> with 60Gb. We would like to use 4 cores per machine. This is pyspark so we
>>>> want to leave say 16Gb on each machine for python processes.
>>>>
>>>> Thanks again for the advice!
>>>>
>>>>
>>>>
>>>> --
>>>> Martin Goodson  |  VP Data Science
>>>> (0)20 3397 1240
>>>> [image: Inline image 1]
>>>>
>>>>
>>>> On Wed, Jul 23, 2014 at 4:19 PM, Andrew Ash <and...@andrewash.com>
>>>> wrote:
>>>>
>>>>> Hi Martin,
>>>>>
>>>>> In standalone mode, each SparkContext you initialize gets its own set
>>>>> of executors across the cluster.  So for example if you have two shells
>>>>> open, they'll each get two JVMs on each worker machine in the cluster.
>>>>>
>>>>> As far as the other docs, you can configure the total number of cores
>>>>> requested for the SparkContext, the amount of memory for the executor JVM
>>>>> on each machine, the amount of memory for the Master/Worker daemons 
>>>>> (little
>>>>> needed since work is done in executors), and several other settings.
>>>>>
>>>>> Which of those are you interested in?  What spec hardware do you have
>>>>> and how do you want to configure it?
>>>>>
>>>>> Andrew
>>>>>
>>>>>
>>>>> On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson <mar...@skimlinks.com>
>>>>> wrote:
>>>>>
>>>>>> We are having difficulties configuring Spark, partly because we still
>>>>>> don't understand some key concepts. For instance, how many executors are
>>>>>> there per machine in standalone mode? This is after having closely
>>>>>> read the documentation several times:
>>>>>>
>>>>>> *http://spark.apache.org/docs/latest/configuration.html
>>>>>> <http://spark.apache.org/docs/latest/configuration.html>*
>>>>>> *http://spark.apache.org/docs/latest/spark-standalone.html
>>>>>> <http://spark.apache.org/docs/latest/spark-standalone.html>*
>>>>>> *http://spark.apache.org/docs/latest/tuning.html
>>>>>> <http://spark.apache.org/docs/latest/tuning.html>*
>>>>>> *http://spark.apache.org/docs/latest/cluster-overview.html
>>>>>> <http://spark.apache.org/docs/latest/cluster-overview.html>*
>>>>>>
>>>>>> The cluster overview has some information here about executors but is
>>>>>> ambiguous about whether there are single executors or multiple executors 
>>>>>> on
>>>>>> each machine.
>>>>>>
>>>>>>  This message from Aaron Davidson implies that the executor memory
>>>>>> should be set to total available memory on the machine divided by the
>>>>>> number of cores:
>>>>>> *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E
>>>>>> <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E>*
>>>>>>
>>>>>> But other messages imply that the executor memory should be set to
>>>>>> the *total* available memory of each machine.
>>>>>>
>>>>>> We would very much appreciate some clarity on this and the myriad of
>>>>>> other memory settings available (daemon memory, worker memory etc). 
>>>>>> Perhaps
>>>>>> a worked example could be added to the docs? I would be happy to provide
>>>>>> some text as soon as someone can enlighten me on the technicalities!
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> --
>>>>>> Martin Goodson  |  VP Data Science
>>>>>> (0)20 3397 1240
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Configuring Spark Memory

Reply via email to