I’ve got an a series of applications using a single standalone Spark cluster (v1.4.1). The cluster has 1 master and 4 workers (4 CPUs per worker node). I am using the start-slave.sh script to launch the worker process on each node and I can see the nodes were successfully registered using the SparkUI. When I launch one of my applications regardless of what I set spark.cores.max to when instantiating the SparkContext in the driver app I seem to get a single worker assigned to the application and all jobs that get run. For example, if I set spark.cores.max to 16 the SparkUI will show a single worker take the load with 4 (16 Used) in the Cores column. How do I get my jobs run across multiple nodes in the cluster?
Here’s a snippet from the SparkUI (IP addresses removed for privacy) Workers Worker Id Address State Cores Memory worker-20150920064814-***-33659 ***:33659 ALIVE 4 (0 Used) 28.0 GB (0.0 B Used) worker-20151012175609-***37399 ***:37399 ALIVE 4 (16 Used) 28.0 GB (28.0 GB Used) worker-20151012181934-***-36573 ***:36573 ALIVE 4 (4 Used) 28.0 GB (28.0 GB Used) worker-20151030170514-***-45368 ***:45368 ALIVE 4 (0 Used) 28.0 GB (0.0 B Used) Running Applications Application ID Name Cores Memory per Node Submitted Time User State Duration app-20151102194733-0278 App1 16 28.0 GB 2015/11/02 19:47:33 *** RUNNING 2 s app-20151102164156-0274 App2 4 28.0 GB 2015/11/02 16:41:56 *** RUNNING 3.1 h Jeff This message (and any attachments) is intended only for the designated recipient(s). It may contain confidential or proprietary information, or have other limitations on use as indicated by the sender. If you are not a designated recipient, you may not review, use, copy or distribute this message. If you received this in error, please notify the sender by reply e-mail and delete this message.