We are having difficulties configuring Spark, partly because we still don't
understand some key concepts. For instance, how many executors are there
per machine in standalone mode? This is after having closely read the
documentation several times:

*http://spark.apache.org/docs/latest/configuration.html
<http://spark.apache.org/docs/latest/configuration.html>*
*http://spark.apache.org/docs/latest/spark-standalone.html
<http://spark.apache.org/docs/latest/spark-standalone.html>*
*http://spark.apache.org/docs/latest/tuning.html
<http://spark.apache.org/docs/latest/tuning.html>*
*http://spark.apache.org/docs/latest/cluster-overview.html
<http://spark.apache.org/docs/latest/cluster-overview.html>*

The cluster overview has some information here about executors but is
ambiguous about whether there are single executors or multiple executors on
each machine.

 This message from Aaron Davidson implies that the executor memory should
be set to total available memory on the machine divided by the number of
cores:
*http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E
<http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E>*

But other messages imply that the executor memory should be set to the
*total* available memory of each machine.

We would very much appreciate some clarity on this and the myriad of other
memory settings available (daemon memory, worker memory etc). Perhaps a
worked example could be added to the docs? I would be happy to provide some
text as soon as someone can enlighten me on the technicalities!

Thank you

-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]

Reply via email to