I don't think it's a red herring... (btw. spark.driver.host needs to be set
to the IP or  FQDN of the machine where you're running the program).

I am running 0.9.2 on CDH4 and the beginning of my executor log looks like
below (I've obfuscated the IP -- this is the log from executor
a100-2-200-245). My driver is running on a100-2-200-238. I am not
specifically setting spark.driver.host or the port but depending on how
your machine is setup you might need to:

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/10/03 18:14:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/10/03 18:14:48 INFO Remoting: Starting remoting
14/10/03 18:14:48 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkExecutor@a100-2-200-245:56760]
14/10/03 18:14:48 INFO Remoting: Remoting now listens on addresses:
**14/10/03 18:14:48 INFO executor.CoarseGrainedExecutorBackend:
Connecting to driver:
14/10/03 18:14:48 INFO worker.WorkerWatcher: Connecting to worker
14/10/03 18:14:48 INFO worker.WorkerWatcher: Successfully connected to
**14/10/03 18:14:49 INFO executor.CoarseGrainedExecutorBackend:
Successfully registered with driver**
14/10/03 18:14:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/10/03 18:14:49 INFO Remoting: Starting remoting

If you look at the lines with ** this is where the driver successfully
connects and at this point you should see your app show up in the UI under
"Running applications"...The worker log you're posting -- is that the log
that stored under work/app-<id>/<executor-id>/stderr? The first line you
show in that log is

 INFO worker.Worker: Executor
    app-20141002131901-0002/9 finished with state FAILED

but I imagine something prior to that would say why the executor failed?

> Yana, many thanks for looking into this!
> I am not running spark-shell in local mode, I am really starting
> spark-shell with --master spark://master:7077 and run in cluster mode.
> Second thing is I tried to set "spark.driver.host" to "master" both in
> scala app when creating context, and in conf/spark-defaults.conf file, but
> this did not make any difference. Worker logs still have same messages:
> 14/10/03 13:37:30 ERROR remote.EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@host2:51414] -> 
> [akka.tcp://sparkExecutor@host2:53851]:
> Error [Association failed with [akka.tcp://sparkExecutor@host2:53851]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@host2:53851]
> Caused by: 
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: host2/xxx.xx.xx.xx:53851
> ]
> note that host1, host2 etc are slave hostnames, and each slave has error
> message about itself: host1:<some random port> cannot connect to
> host1:<some random port>.
> However I noticed that after running successfully SparkPi app log also is
> populated with similar "connection refused" messages, but this does not
> lead to application death... So these worker logs are probably a false clue.
>> when you're running spark-shell and the example, are you actually
>> specifying --master spark://master:7077 as shown here:
>> http://spark.apache.org/docs/latest/programming-guide.html#
>> initializing-spark
>> because if you're not, your spark-shell is running in local mode and not
>> actually connecting to the cluster. Also, if you run spark-shell against
>> the cluster, you'll see it listed under the Running applications in the
>> master UI. It would be pretty odd for spark shell to connect
>> successfully to the cluster but for your app to not connect...(which is
>> why I suspect that you're running spark-shell local)
>> Another thing to check, the executors need to connect back to your
>> driver, so it could be that you have to set the driver host or driver
>> port...in fact looking at your executor log, this seems fairly likely:
>> is host1/xxx.xx.xx.xx:45542 the machine where your driver is running? is
>> that host/port reachable from the worker machines?
>>     Hi,
>>     I have set up Spark 0.9.2 standalone cluster using CDH5 and
>>     pre-built spark distribution archive for Hadoop 2. I was not using
>>     spark-ec2 scripts because I am not on EC2 cloud.
>>     Spark-shell seems to be working properly -- I am able to perform
>>     simple RDD operations, as well as e.g. SparkPi standalone example
>>     works well when run via `run-example`. Web UI shows all workers
>>     connected.
>>     However, standalone Scala application gets "connection refused"
>>     messages. I think this has something to do with configuration,
>>     because spark-shell and SparkPi works well. I verified that
>>     .setMaster and .setSparkHome are properly assigned within scala app.
>>     Is there anything else in configuration of standalone scala app on
>>     spark that I am missing?
>>     I would very much appreciate any clues.
>>     Namely, I am trying to run MovieLensALS.scala example from AMPCamp
>>     big data mini course
http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html
>> recommendation-with-mllib.html
http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html
>> recommendation-with-mllib.html>__).
>>     Here is error which I get when try to run compiled jar:
>>     ---------------
root@master:~/machine-learning/scala# sbt/sbt package "run /movielens/medium"
>>     /movielens/medium"
>>     Launching sbt from sbt/sbt-launch-0.12.4.jar
>>     [info] Loading project definition from
/root/training/machine-learning/scala/project
>>     [info] Set current project to movielens-als (in build
file:/root/training/machine-learning/scala/)
>>     [info] Compiling 1 Scala source to
/root/training/machine-learning/scala/target/scala-2.10/classes...
>> __10/classes...
>>     [warn] there were 2 deprecation warning(s); re-run with -deprecation
>>     for details
>>     [warn] one warning found
>>     [info] Packaging
/root/training/machine-learning/scala/target/scala-2.10/movielens-als_2.10-0.0.jar
>> __10/movielens-als_2.10-0.0.jar
>>     ...
>>     [info] Done packaging.
>>     [success] Total time: 6 s, completed Oct 2, 2014 1:19:00 PM
>>     [info] Running MovieLensALS /movielens/medium
>>     master = spark://master:7077
>>     log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
>>     log4j:WARN Please initialize the log4j system properly.
>>     log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>> info.
>>     14/10/02 13:19:01 WARN NativeCodeLoader: Unable to load
>>     native-hadoop library for your platform... using builtin-java
>>     classes where applicable
>>     HERE
>>     THERE
>>     14/10/02 13:19:02 INFO FileInputFormat: Total input paths to process
>> : 1
>>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 0 on host2:
>>     remote Akka client disassociated
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
>>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 4 on host5:
>>     remote Akka client disassociated
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
>>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 1 on host4:
>>     remote Akka client disassociated
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
>>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 3 on host3:
>>     remote Akka client disassociated
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
>>     14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 2 on host1:
>>     remote Akka client disassociated
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
>>     14/10/02 13:19:03 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
>>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 6 on host4:
>>     remote Akka client disassociated
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 8 (task 0.0:0)
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
>>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 5 on host2:
>>     remote Akka client disassociated
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 10 (task 0.0:1)
>>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 7 on host5:
>>     remote Akka client disassociated
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 11 (task 0.0:0)
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 12 (task 0.0:1)
>>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 8 on host3:
>>     remote Akka client disassociated
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 13 (task 0.0:1)
>>     14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 9 on host1:
>>     remote Akka client disassociated
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 14 (task 0.0:0)
>>     14/10/02 13:19:04 WARN TaskSetManager: Lost TID 15 (task 0.0:1)
>>     14/10/02 13:19:05 ERROR AppClient$ClientActor: Master removed our
>>     application: FAILED; stopping client
>>     14/10/02 13:19:05 WARN SparkDeploySchedulerBackend: Disconnected
>>     from Spark cluster! Waiting for reconnection...
>>     14/10/02 13:19:06 ERROR TaskSchedulerImpl: Lost executor 11 on
>>     host5: remote Akka client disassociated
>>     14/10/02 13:19:06 WARN TaskSetManager: Lost TID 17 (task 0.0:0)
>>     14/10/02 13:19:06 WARN TaskSetManager: Lost TID 16 (task 0.0:1)
>>     ---------------
>>     And this is error log on one of the workers:
>>     ---------------
>>     14/10/02 13:19:05 INFO worker.Worker: Executor
>>     app-20141002131901-0002/9 finished with state FAILED message Command
>>     exited with code 1 exitStatus 1
>>     14/10/02 13:19:05 INFO actor.LocalActorRef: Message
akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying
>> DisassociateUnderlying]
>>     from Actor[akka://sparkWorker/__deadLetters] to
>>     Actor[akka://sparkWorker/__system/transports/__
>> akkaprotocolmanager.tcp0/__akkaProtocol-tcp%3A%2F%__
>> 2FsparkWorker%40xxx.xx.xx.xx%__3A57719-15#1504298502]
>>     was not delivered. [6] dead letters encountered. This logging can be
>>     turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
>> __shutdown'.
>>     14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>>     [akka.tcp://sparkWorker@host1:__47421] ->
Error [Association failed with [akka.tcp://sparkExecutor@host1:45542]] [
>>     with [akka.tcp://sparkExecutor@__host1:45542]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@host1:45542]
>>     [akka.tcp://sparkExecutor@__host1:45542]
>>     Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>> associate$1$$anon$2:
>>     Connection refused: host1/xxx.xx.xx.xx:45542
>>     ]
>>     14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>>     [akka.tcp://sparkWorker@host1:__47421] ->
Error [Association failed with [akka.tcp://sparkExecutor@host1:45542]] [
>>     with [akka.tcp://sparkExecutor@__host1:45542]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@host1:45542]
>>     [akka.tcp://sparkExecutor@__host1:45542]
>>     Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>> associate$1$$anon$2:
>>     Connection refused: host1/xxx.xx.xx.xx:45542
>>     ]
>>     14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
>>     [akka.tcp://sparkWorker@host1:__47421] ->
Error [Association failed with [akka.tcp://sparkExecutor@host1:45542]] [
>>     with [akka.tcp://sparkExecutor@__host1:45542]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@host1:45542]
>>     [akka.tcp://sparkExecutor@__host1:45542]
>>     Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
>> associate$1$$anon$2:
>>     Connection refused: host1/xxx.xx.xx.xx:45542
>>     ---------------
>>     Thanks!
>>     Irina
Reply via email to