We are having difficulties configuring Spark, partly because we still don't understand some key concepts. For instance, how many executors are there per machine in standalone mode? This is after having closely read the documentation several times:
*http://spark.apache.org/docs/latest/configuration.html <http://spark.apache.org/docs/latest/configuration.html>* *http://spark.apache.org/docs/latest/spark-standalone.html <http://spark.apache.org/docs/latest/spark-standalone.html>* *http://spark.apache.org/docs/latest/tuning.html <http://spark.apache.org/docs/latest/tuning.html>* *http://spark.apache.org/docs/latest/cluster-overview.html <http://spark.apache.org/docs/latest/cluster-overview.html>* The cluster overview has some information here about executors but is ambiguous about whether there are single executors or multiple executors on each machine. This message from Aaron Davidson implies that the executor memory should be set to total available memory on the machine divided by the number of cores: *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E>* But other messages imply that the executor memory should be set to the *total* available memory of each machine. We would very much appreciate some clarity on this and the myriad of other memory settings available (daemon memory, worker memory etc). Perhaps a worked example could be added to the docs? I would be happy to provide some text as soon as someone can enlighten me on the technicalities! Thank you -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1]