Could you try setting MASTER variable in spark-env.sh

export MASTER=spark://<master-ip>:7077

For starting the standalone cluster, ./sbin/start-all.sh should work as far
as you have password less access to all machines. Any error here ?




On Tue, Apr 22, 2014 at 10:10 PM, jaeholee <jho...@lbl.gov> wrote:

> No, I am not using the aws. I am using one of the national lab's cluster.
> But
> as I mentioned, I am pretty new to computer science, so I might not be
> answering your question right... but 7077 is accessible.
>
> Maybe I got it wrong from the get-go? I will just write down what I did...
>
> Basically I have a cluster with bunch of nodes (call them #1 ~ #10), picked
> one node (call it #1) to be the master (and one of the workers)
>
> I updated the conf/spark-env.sh file with MASTER_IP, MASTER_PORT,
> MASTER_WEBUI_PORT, CORES, MEMORY, WORKER_PORT, WORKER_WEBUI_PORT
>
> I start the master on #1 with ./sbin/start-master.sh
>
> But ./sbin/start-slaves.sh doesn't work for me, so I wrote a script that
> ssh
> into the worker nodes (#1 ~ #10) and start the worker:
>
> for server in $(cat /somedirectory/hostnames.txt)
> do
> ssh $server "nohup /somedirectory/somedirectory/spark-0.9.1/bin/spark-class
> org.apache.spark.deploy.worker.Worker spark://MASTER_IP:MASTER_PORT >
> /somedirectory/nohup.out & exit"
> done
>
>
> then I go to #1, and start ./bin/spark-shell and that's when I get that
> error message.
>
> Sorry if it got more confusing..
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskSchedulerImpl-Lost-an-executor-tp4566p4610.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to