
I have setup a Spark standalone-cluster, which involves 5 workers,
using spark-ec2 script.

After submitting my Spark application, I had noticed that just one
worker seemed to run the application and other 4 workers were doing
nothing. I had confirmed this by checking CPU and memory usage on the
Spark Web UI (CPU usage indicates zero and memory is almost fully

This is the command used to launch:

$ ~/spark/ec2/spark-ec2 -k awesome-keypair-name -i
/path/to/.ssh/awesome-private-key.pem --region ap-northeast-1
--zone=ap-northeast-1a --slaves 5 --instance-type m1.large
--hadoop-major-version yarn launch awesome-spark-cluster

And the command to run application:

$ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
"mkdir ~/awesome"
$ scp -i ~/path/to/awesome-private-key.pem spark.jar
root@ec2-master-host-name:~/awesome && ssh -i
~/path/to/awesome-private-key.pem root@ec2-master-host-name
"~/spark-ec2/copy-dir ~/awesome"
$ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
"~/spark/bin/spark-submit --num-executors 5 --executor-cores 2
--executor-memory 5G --total-executor-cores 10 --driver-cores 2
--driver-memory 5G --class com.example.SparkIsAwesome

How do I let the all of the workers execute the app?

Or do I have wrong understanding on what workers, slaves and executors are?

My understanding is: Spark driver(or maybe master?) sends a part of
jobs to each worker (== executor == slave), so a Spark cluster
automatically exploits all resources available in the cluster. Is this
some sort of misconception?


