Hi,

I have setup a Spark standalone-cluster, which involves 5 workers,
using spark-ec2 script.

After submitting my Spark application, I had noticed that just one
worker seemed to run the application and other 4 workers were doing
nothing. I had confirmed this by checking CPU and memory usage on the
Spark Web UI (CPU usage indicates zero and memory is almost fully
availabile.)

This is the command used to launch:

$ ~/spark/ec2/spark-ec2 -k awesome-keypair-name -i
/path/to/.ssh/awesome-private-key.pem --region ap-northeast-1
--zone=ap-northeast-1a --slaves 5 --instance-type m1.large
--hadoop-major-version yarn launch awesome-spark-cluster

And the command to run application:

$ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
"mkdir ~/awesome"
$ scp -i ~/path/to/awesome-private-key.pem spark.jar
root@ec2-master-host-name:~/awesome && ssh -i
~/path/to/awesome-private-key.pem root@ec2-master-host-name
"~/spark-ec2/copy-dir ~/awesome"
$ ssh -i ~/path/to/awesome-private-key.pem root@ec2-master-host-name
"~/spark/bin/spark-submit --num-executors 5 --executor-cores 2
--executor-memory 5G --total-executor-cores 10 --driver-cores 2
--driver-memory 5G --class com.example.SparkIsAwesome
awesome/spark.jar"

How do I let the all of the workers execute the app?

Or do I have wrong understanding on what workers, slaves and executors are?

My understanding is: Spark driver(or maybe master?) sends a part of
jobs to each worker (== executor == slave), so a Spark cluster
automatically exploits all resources available in the cluster. Is this
some sort of misconception?

Thanks,

--
Kyohey Hamaguchi
TEL:  080-6918-1708
Mail: tnzk.ma...@gmail.com
Blog: http://blog.tnzk.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to