Missed to mention that the distribution was generated on master by running make-distribution.sh and then the dist directory was scp-ed to all worker nodes. Thus the worker nodes only have the dist directory
On Mon, Jan 20, 2014 at 9:02 PM, Manoj Samel <manojsamelt...@gmail.com>wrote: > Hi, > > I configured spark 0.8.1 cluster on AWS with one master node and 3 worker > nodes. The cluster was configured as a standalone cluster using > http://spark.incubator.apache.org/docs/latest/spark-standalone.html > > The distribution was generated > the master node was started on master host with ./bin/start-master.sh > Then on each of the worker nodes, I did a cd spark-distro directory and did > ./spark-class org.apache.spark.deploy.worker.Worker spark://IPxxxx:7077 > > In the browser, on master 8080 port, I can see the 3 worker nodes ALIVE > > Next I start a spark shell on master node with > MASTER=spark://IPxxx:7077 ./spark-shell. > > In it I create a simple RDD on a local text file with few lines and do > countByKey(). The shell hangs. Doing ctrl-C gives > > scala> credit.countByKey() > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:318) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:840) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:909) > at org.apache.spark.rdd.RDD.reduce(RDD.scala:654) > at org.apache.spark.rdd.RDD.countByValue(RDD.scala:752) > at > org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:198) > > Note - the same works in a local shell (without master). > > Any pointers? Do I have to set any other network/logins? Note I am *** NOT > *** starting slaves from the master node (using bin/start-slaves.sh) and > thus have not set passwordless ssh login etc. >