Yana, many thanks for looking into this!

I am not running spark-shell in local mode, I am really starting spark-shell with --master spark://master:7077 and run in cluster mode.

Second thing is I tried to set "spark.driver.host" to "master" both in scala app when creating context, and in conf/spark-defaults.conf file, but this did not make any difference. Worker logs still have same messages: 14/10/03 13:37:30 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker@host2:51414] -> [akka.tcp://sparkExecutor@host2:53851]: Error [Association failed with [akka.tcp://sparkExecutor@host2:53851]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@host2:53851] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: host2/xxx.xx.xx.xx:53851
]

note that host1, host2 etc are slave hostnames, and each slave has error message about itself: host1:<some random port> cannot connect to host1:<some random port>.

However I noticed that after running successfully SparkPi app log also is populated with similar "connection refused" messages, but this does not lead to application death... So these worker logs are probably a false clue.



On 03.10.14 19:37, Yana Kadiyska wrote:
when you're running spark-shell and the example, are you actually
specifying --master spark://master:7077 as shown here:
http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark

because if you're not, your spark-shell is running in local mode and not
actually connecting to the cluster. Also, if you run spark-shell against
the cluster, you'll see it listed under the Running applications in the
master UI. It would be pretty odd for spark shell to connect
successfully to the cluster but for your app to not connect...(which is
why I suspect that you're running spark-shell local)

Another thing to check, the executors need to connect back to your
driver, so it could be that you have to set the driver host or driver
port...in fact looking at your executor log, this seems fairly likely:
is host1/xxx.xx.xx.xx:45542 the machine where your driver is running? is
that host/port reachable from the worker machines?

On Fri, Oct 3, 2014 at 5:32 AM, Irina Fedulova <fedul...@gmail.com
<mailto:fedul...@gmail.com>> wrote:

    Hi,

    I have set up Spark 0.9.2 standalone cluster using CDH5 and
    pre-built spark distribution archive for Hadoop 2. I was not using
    spark-ec2 scripts because I am not on EC2 cloud.

    Spark-shell seems to be working properly -- I am able to perform
    simple RDD operations, as well as e.g. SparkPi standalone example
    works well when run via `run-example`. Web UI shows all workers
    connected.

    However, standalone Scala application gets "connection refused"
    messages. I think this has something to do with configuration,
    because spark-shell and SparkPi works well. I verified that
    .setMaster and .setSparkHome are properly assigned within scala app.

    Is there anything else in configuration of standalone scala app on
    spark that I am missing?
    I would very much appreciate any clues.

    Namely, I am trying to run MovieLensALS.scala example from AMPCamp
    big data mini course
    
(http://ampcamp.berkeley.edu/__big-data-mini-course/movie-__recommendation-with-mllib.html
    
<http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html>__).

    Here is error which I get when try to run compiled jar:
    ---------------
    root@master:~/machine-__learning/scala# sbt/sbt package "run
    /movielens/medium"
    Launching sbt from sbt/sbt-launch-0.12.4.jar
    [info] Loading project definition from
    /root/training/machine-__learning/scala/project
    [info] Set current project to movielens-als (in build
    file:/root/training/machine-__learning/scala/)
    [info] Compiling 1 Scala source to
    /root/training/machine-__learning/scala/target/scala-2.__10/classes...
    [warn] there were 2 deprecation warning(s); re-run with -deprecation
    for details
    [warn] one warning found
    [info] Packaging
    
/root/training/machine-__learning/scala/target/scala-2.__10/movielens-als_2.10-0.0.jar
    ...
    [info] Done packaging.
    [success] Total time: 6 s, completed Oct 2, 2014 1:19:00 PM
    [info] Running MovieLensALS /movielens/medium
    master = spark://master:7077
    log4j:WARN No appenders could be found for logger
    (akka.event.slf4j.Slf4jLogger)__.
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See
    http://logging.apache.org/__log4j/1.2/faq.html#noconfig
    <http://logging.apache.org/log4j/1.2/faq.html#noconfig> for more info.
    14/10/02 13:19:01 WARN NativeCodeLoader: Unable to load
    native-hadoop library for your platform... using builtin-java
    classes where applicable
    HERE
    THERE
    14/10/02 13:19:02 INFO FileInputFormat: Total input paths to process : 1
    14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 0 on host2:
    remote Akka client disassociated
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
    14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 4 on host5:
    remote Akka client disassociated
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
    14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 1 on host4:
    remote Akka client disassociated
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
    14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 3 on host3:
    remote Akka client disassociated
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
    14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 2 on host1:
    remote Akka client disassociated
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
    14/10/02 13:19:03 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
    14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 6 on host4:
    remote Akka client disassociated
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 8 (task 0.0:0)
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
    14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 5 on host2:
    remote Akka client disassociated
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 10 (task 0.0:1)
    14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 7 on host5:
    remote Akka client disassociated
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 11 (task 0.0:0)
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 12 (task 0.0:1)
    14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 8 on host3:
    remote Akka client disassociated
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 13 (task 0.0:1)
    14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 9 on host1:
    remote Akka client disassociated
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 14 (task 0.0:0)
    14/10/02 13:19:04 WARN TaskSetManager: Lost TID 15 (task 0.0:1)
    14/10/02 13:19:05 ERROR AppClient$ClientActor: Master removed our
    application: FAILED; stopping client
    14/10/02 13:19:05 WARN SparkDeploySchedulerBackend: Disconnected
    from Spark cluster! Waiting for reconnection...
    14/10/02 13:19:06 ERROR TaskSchedulerImpl: Lost executor 11 on
    host5: remote Akka client disassociated
    14/10/02 13:19:06 WARN TaskSetManager: Lost TID 17 (task 0.0:0)
    14/10/02 13:19:06 WARN TaskSetManager: Lost TID 16 (task 0.0:1)
    ---------------

    And this is error log on one of the workers:
    ---------------
    14/10/02 13:19:05 INFO worker.Worker: Executor
    app-20141002131901-0002/9 finished with state FAILED message Command
    exited with code 1 exitStatus 1
    14/10/02 13:19:05 INFO actor.LocalActorRef: Message
    [akka.remote.transport.__ActorTransportAdapter$__DisassociateUnderlying]
    from Actor[akka://sparkWorker/__deadLetters] to
    
Actor[akka://sparkWorker/__system/transports/__akkaprotocolmanager.tcp0/__akkaProtocol-tcp%3A%2F%__2FsparkWorker%40xxx.xx.xx.xx%__3A57719-15#1504298502]
    was not delivered. [6] dead letters encountered. This logging can be
    turned off or adjusted with configuration settings
    'akka.log-dead-letters' and 'akka.log-dead-letters-during-__shutdown'.
    14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
    [akka.tcp://sparkWorker@host1:__47421] ->
    [akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
    with [akka.tcp://sparkExecutor@__host1:45542]] [
    akka.remote.__EndpointAssociationException: Association failed with
    [akka.tcp://sparkExecutor@__host1:45542]
    Caused by:
    akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
    Connection refused: host1/xxx.xx.xx.xx:45542
    ]
    14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
    [akka.tcp://sparkWorker@host1:__47421] ->
    [akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
    with [akka.tcp://sparkExecutor@__host1:45542]] [
    akka.remote.__EndpointAssociationException: Association failed with
    [akka.tcp://sparkExecutor@__host1:45542]
    Caused by:
    akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
    Connection refused: host1/xxx.xx.xx.xx:45542
    ]
    14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
    [akka.tcp://sparkWorker@host1:__47421] ->
    [akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
    with [akka.tcp://sparkExecutor@__host1:45542]] [
    akka.remote.__EndpointAssociationException: Association failed with
    [akka.tcp://sparkExecutor@__host1:45542]
    Caused by:
    akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
    Connection refused: host1/xxx.xx.xx.xx:45542
    ---------------

    Thanks!
    Irina

    ------------------------------__------------------------------__---------
    To unsubscribe, e-mail: user-unsubscribe@spark.apache.__org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to