Thanks Andrew,

So if there is only one SparkContext there is only one executor per
machine? This seems to contradict Aaron's message from the link above:

"If each machine has 16 GB of RAM and 4 cores, for example, you might set
spark.executor.memory between 2 and 3 GB, totaling 8-12 GB used by Spark.)"

Am I reading this incorrectly?

Anyway our configuration is 21 machines (one master and 20 slaves) each
with 60Gb. We would like to use 4 cores per machine. This is pyspark so we
want to leave say 16Gb on each machine for python processes.

Thanks again for the advice!



-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]


On Wed, Jul 23, 2014 at 4:19 PM, Andrew Ash <and...@andrewash.com> wrote:

> Hi Martin,
>
> In standalone mode, each SparkContext you initialize gets its own set of
> executors across the cluster.  So for example if you have two shells open,
> they'll each get two JVMs on each worker machine in the cluster.
>
> As far as the other docs, you can configure the total number of cores
> requested for the SparkContext, the amount of memory for the executor JVM
> on each machine, the amount of memory for the Master/Worker daemons (little
> needed since work is done in executors), and several other settings.
>
> Which of those are you interested in?  What spec hardware do you have and
> how do you want to configure it?
>
> Andrew
>
>
> On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson <mar...@skimlinks.com>
> wrote:
>
>> We are having difficulties configuring Spark, partly because we still
>> don't understand some key concepts. For instance, how many executors are
>> there per machine in standalone mode? This is after having closely read
>> the documentation several times:
>>
>> *http://spark.apache.org/docs/latest/configuration.html
>> <http://spark.apache.org/docs/latest/configuration.html>*
>> *http://spark.apache.org/docs/latest/spark-standalone.html
>> <http://spark.apache.org/docs/latest/spark-standalone.html>*
>> *http://spark.apache.org/docs/latest/tuning.html
>> <http://spark.apache.org/docs/latest/tuning.html>*
>> *http://spark.apache.org/docs/latest/cluster-overview.html
>> <http://spark.apache.org/docs/latest/cluster-overview.html>*
>>
>> The cluster overview has some information here about executors but is
>> ambiguous about whether there are single executors or multiple executors on
>> each machine.
>>
>>  This message from Aaron Davidson implies that the executor memory
>> should be set to total available memory on the machine divided by the
>> number of cores:
>> *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E
>> <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E>*
>>
>> But other messages imply that the executor memory should be set to the
>> *total* available memory of each machine.
>>
>> We would very much appreciate some clarity on this and the myriad of
>> other memory settings available (daemon memory, worker memory etc). Perhaps
>> a worked example could be added to the docs? I would be happy to provide
>> some text as soon as someone can enlighten me on the technicalities!
>>
>> Thank you
>>
>> --
>> Martin Goodson  |  VP Data Science
>> (0)20 3397 1240
>> [image: Inline image 1]
>>
>
>

Reply via email to