I think the problem is the use the loopback address: export SPARK_LOCAL_IP=127.0.0.1
In the stack trace from the slave, you see this: ... Reason: Connection refused: localhost/127.0.0.1:51849 akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@localhost:51849/), Path(/user/MapOutputTracker)] It's trying to connect to an Akka actor on itself, using the loopback address. Try changing SPARK_LOCAL_IP to the publicly routable IP address. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe <http://typesafe.com> @deanwampler <http://twitter.com/deanwampler> http://polyglotprogramming.com On Mon, Mar 23, 2015 at 7:37 PM, Anirudha Jadhav <anirudh...@gmail.com> wrote: > My bad there, I was using the correct link for docs. The spark shell runs > correctly, the framework is registered fine on mesos. > > is there some setting i am missing: > this is my spark-env.sh>>> > > export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so > export SPARK_EXECUTOR_URI=http://100.125.5.93/sparkx.tgz > export SPARK_LOCAL_IP=127.0.0.1 > > > > here is what i see on the slave node. > ---------------- > less > 20150226-160708-788888932-5050-8971-S0/frameworks/20150323-205508-788888932-5050-29804-0012/executors/20150226-160708-788888932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/stderr > >>>>> > > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0324 02:30:29.389225 27755 fetcher.cpp:76] Fetching URI ' > http://100.125.5.93/sparkx.tgz' > I0324 02:30:29.389361 27755 fetcher.cpp:126] Downloading ' > http://100.125.5.93/sparkx.tgz' to > '/tmp/mesos/slaves/20150226-160708-788888932-5050-8971-S0/frameworks/20150323-205508-788888932-5050-29804-0012/executors/20150226-160708-788888932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz' > I0324 02:30:35.353446 27755 fetcher.cpp:64] Extracted resource > '/tmp/mesos/slaves/20150226-160708-788888932-5050-8971-S0/frameworks/20150323-205508-788888932-5050-29804-0012/executors/20150226-160708-788888932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz' > into > '/tmp/mesos/slaves/20150226-160708-788888932-5050-8971-S0/frameworks/20150323-205508-788888932-5050-29804-0012/executors/20150226-160708-788888932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56' > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 15/03/24 02:30:37 INFO MesosExecutorBackend: Registered signal handlers > for [TERM, HUP, INT] > I0324 02:30:37.071077 27863 exec.cpp:132] Version: 0.21.1 > I0324 02:30:37.080971 27885 exec.cpp:206] Executor registered on slave > 20150226-160708-788888932-5050-8971-S0 > 15/03/24 02:30:37 INFO MesosExecutorBackend: Registered with Mesos as > executor ID 20150226-160708-788888932-5050-8971-S0 with 1 cpus > 15/03/24 02:30:37 INFO SecurityManager: Changing view acls to: ubuntu > 15/03/24 02:30:37 INFO SecurityManager: Changing modify acls to: ubuntu > 15/03/24 02:30:37 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(ubuntu); users > with modify permissions: Set(ubuntu) > 15/03/24 02:30:37 INFO Slf4jLogger: Slf4jLogger started > 15/03/24 02:30:37 INFO Remoting: Starting remoting > 15/03/24 02:30:38 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://sparkexecu...@mesos-si2.dny1.bcpc.bloomberg.com:50542] > 15/03/24 02:30:38 INFO Utils: Successfully started service 'sparkExecutor' > on port 50542. > 15/03/24 02:30:38 INFO AkkaUtils: Connecting to MapOutputTracker: > akka.tcp://sparkDriver@localhost:51849/user/MapOutputTracker > 15/03/24 02:30:38 WARN Remoting: Tried to associate with unreachable > remote address [akka.tcp://sparkDriver@localhost:51849]. Address is now > gated for 5000 ms, all messages to this address will be delivered to dead > letters. Reason: Connection refused: localhost/127.0.0.1:51849 > akka.actor.ActorNotFound: Actor not found for: > ActorSelection[Anchor(akka.tcp://sparkDriver@localhost:51849/), > Path(/user/MapOutputTracker)] > at > akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) > at > akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) > at > akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) > at > akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) > at > akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) > at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508) > at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541) > at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531) > at > akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87) > > > > > > On Mar 23, 2015, at 3:02 PM, Dean Wampler <deanwamp...@gmail.com> wrote: > > That's a very old page, try this instead: > > http://spark.apache.org/docs/latest/running-on-mesos.html > > When you run your Spark job on Mesos, tasks will be started on the slave > nodes as needed, since "fine-grained" mode is the default. > > For a job like your example, very few tasks will be needed. Actually only > one would be enough, but the default number of partitions will be used. I > believe 8 is the default for Mesos. For local mode ("local[*]"), it's the > number of cores. You can also set the propoerty "spark.default.parallelism". > > HTH, > > Dean > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Mon, Mar 23, 2015 at 11:46 AM, Anirudha Jadhav <aniru...@nyu.edu> > wrote: > >> i have a mesos cluster, which i deploy spark to by using instructions on >> http://spark.apache.org/docs/0.7.2/running-on-mesos.html >> >> after that the spark shell starts up fine. >> then i try the following on the shell: >> >> val data = 1 to 10000 >> >> val distData = sc.parallelize(data) >> >> distData.filter(_< 10).collect() >> >> open spark web ui at host:4040 and see an active job. >> >> NOW, how do i start workers or spark workers on mesos ? who completes my >> job? >> thanks, >> >> -- >> Ani >> > >