Could you try setting MASTER variable in spark-env.sh export MASTER=spark://<master-ip>:7077
For starting the standalone cluster, ./sbin/start-all.sh should work as far as you have password less access to all machines. Any error here ? On Tue, Apr 22, 2014 at 10:10 PM, jaeholee <jho...@lbl.gov> wrote: > No, I am not using the aws. I am using one of the national lab's cluster. > But > as I mentioned, I am pretty new to computer science, so I might not be > answering your question right... but 7077 is accessible. > > Maybe I got it wrong from the get-go? I will just write down what I did... > > Basically I have a cluster with bunch of nodes (call them #1 ~ #10), picked > one node (call it #1) to be the master (and one of the workers) > > I updated the conf/spark-env.sh file with MASTER_IP, MASTER_PORT, > MASTER_WEBUI_PORT, CORES, MEMORY, WORKER_PORT, WORKER_WEBUI_PORT > > I start the master on #1 with ./sbin/start-master.sh > > But ./sbin/start-slaves.sh doesn't work for me, so I wrote a script that > ssh > into the worker nodes (#1 ~ #10) and start the worker: > > for server in $(cat /somedirectory/hostnames.txt) > do > ssh $server "nohup /somedirectory/somedirectory/spark-0.9.1/bin/spark-class > org.apache.spark.deploy.worker.Worker spark://MASTER_IP:MASTER_PORT > > /somedirectory/nohup.out & exit" > done > > > then I go to #1, and start ./bin/spark-shell and that's when I get that > error message. > > Sorry if it got more confusing.. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4610.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >