Yana, many thanks for looking into this!
I am not running spark-shell in local mode, I am really starting
spark-shell with --master spark://master:7077 and run in cluster mode.
Second thing is I tried to set "spark.driver.host" to "master" both in
scala app when creating context, and in conf/spark-defaults.conf file,
but this did not make any difference. Worker logs still have same messages:
14/10/03 13:37:30 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkWorker@host2:51414] ->
[akka.tcp://sparkExecutor@host2:53851]: Error [Association failed with
[akka.tcp://sparkExecutor@host2:53851]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@host2:53851]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: host2/xxx.xx.xx.xx:53851
]
note that host1, host2 etc are slave hostnames, and each slave has error
message about itself: host1:<some random port> cannot connect to
host1:<some random port>.
However I noticed that after running successfully SparkPi app log also
is populated with similar "connection refused" messages, but this does
not lead to application death... So these worker logs are probably a
false clue.
On 03.10.14 19:37, Yana Kadiyska wrote:
when you're running spark-shell and the example, are you actually
specifying --master spark://master:7077 as shown here:
http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark
because if you're not, your spark-shell is running in local mode and not
actually connecting to the cluster. Also, if you run spark-shell against
the cluster, you'll see it listed under the Running applications in the
master UI. It would be pretty odd for spark shell to connect
successfully to the cluster but for your app to not connect...(which is
why I suspect that you're running spark-shell local)
Another thing to check, the executors need to connect back to your
driver, so it could be that you have to set the driver host or driver
port...in fact looking at your executor log, this seems fairly likely:
is host1/xxx.xx.xx.xx:45542 the machine where your driver is running? is
that host/port reachable from the worker machines?
On Fri, Oct 3, 2014 at 5:32 AM, Irina Fedulova <fedul...@gmail.com
<mailto:fedul...@gmail.com>> wrote:
Hi,
I have set up Spark 0.9.2 standalone cluster using CDH5 and
pre-built spark distribution archive for Hadoop 2. I was not using
spark-ec2 scripts because I am not on EC2 cloud.
Spark-shell seems to be working properly -- I am able to perform
simple RDD operations, as well as e.g. SparkPi standalone example
works well when run via `run-example`. Web UI shows all workers
connected.
However, standalone Scala application gets "connection refused"
messages. I think this has something to do with configuration,
because spark-shell and SparkPi works well. I verified that
.setMaster and .setSparkHome are properly assigned within scala app.
Is there anything else in configuration of standalone scala app on
spark that I am missing?
I would very much appreciate any clues.
Namely, I am trying to run MovieLensALS.scala example from AMPCamp
big data mini course
(http://ampcamp.berkeley.edu/__big-data-mini-course/movie-__recommendation-with-mllib.html
<http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html>__).
Here is error which I get when try to run compiled jar:
---------------
root@master:~/machine-__learning/scala# sbt/sbt package "run
/movielens/medium"
Launching sbt from sbt/sbt-launch-0.12.4.jar
[info] Loading project definition from
/root/training/machine-__learning/scala/project
[info] Set current project to movielens-als (in build
file:/root/training/machine-__learning/scala/)
[info] Compiling 1 Scala source to
/root/training/machine-__learning/scala/target/scala-2.__10/classes...
[warn] there were 2 deprecation warning(s); re-run with -deprecation
for details
[warn] one warning found
[info] Packaging
/root/training/machine-__learning/scala/target/scala-2.__10/movielens-als_2.10-0.0.jar
...
[info] Done packaging.
[success] Total time: 6 s, completed Oct 2, 2014 1:19:00 PM
[info] Running MovieLensALS /movielens/medium
master = spark://master:7077
log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger)__.
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See
http://logging.apache.org/__log4j/1.2/faq.html#noconfig
<http://logging.apache.org/log4j/1.2/faq.html#noconfig> for more info.
14/10/02 13:19:01 WARN NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java
classes where applicable
HERE
THERE
14/10/02 13:19:02 INFO FileInputFormat: Total input paths to process : 1
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 0 on host2:
remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 4 on host5:
remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 1 on host4:
remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 3 on host3:
remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 2 on host1:
remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 6 on host4:
remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 8 (task 0.0:0)
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 5 on host2:
remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 10 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 7 on host5:
remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 11 (task 0.0:0)
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 12 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 8 on host3:
remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 13 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 9 on host1:
remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 14 (task 0.0:0)
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 15 (task 0.0:1)
14/10/02 13:19:05 ERROR AppClient$ClientActor: Master removed our
application: FAILED; stopping client
14/10/02 13:19:05 WARN SparkDeploySchedulerBackend: Disconnected
from Spark cluster! Waiting for reconnection...
14/10/02 13:19:06 ERROR TaskSchedulerImpl: Lost executor 11 on
host5: remote Akka client disassociated
14/10/02 13:19:06 WARN TaskSetManager: Lost TID 17 (task 0.0:0)
14/10/02 13:19:06 WARN TaskSetManager: Lost TID 16 (task 0.0:1)
---------------
And this is error log on one of the workers:
---------------
14/10/02 13:19:05 INFO worker.Worker: Executor
app-20141002131901-0002/9 finished with state FAILED message Command
exited with code 1 exitStatus 1
14/10/02 13:19:05 INFO actor.LocalActorRef: Message
[akka.remote.transport.__ActorTransportAdapter$__DisassociateUnderlying]
from Actor[akka://sparkWorker/__deadLetters] to
Actor[akka://sparkWorker/__system/transports/__akkaprotocolmanager.tcp0/__akkaProtocol-tcp%3A%2F%__2FsparkWorker%40xxx.xx.xx.xx%__3A57719-15#1504298502]
was not delivered. [6] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-__shutdown'.
14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkWorker@host1:__47421] ->
[akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
with [akka.tcp://sparkExecutor@__host1:45542]] [
akka.remote.__EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@__host1:45542]
Caused by:
akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
Connection refused: host1/xxx.xx.xx.xx:45542
]
14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkWorker@host1:__47421] ->
[akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
with [akka.tcp://sparkExecutor@__host1:45542]] [
akka.remote.__EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@__host1:45542]
Caused by:
akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
Connection refused: host1/xxx.xx.xx.xx:45542
]
14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkWorker@host1:__47421] ->
[akka.tcp://sparkExecutor@__host1:45542]: Error [Association failed
with [akka.tcp://sparkExecutor@__host1:45542]] [
akka.remote.__EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@__host1:45542]
Caused by:
akka.remote.transport.netty.__NettyTransport$$anonfun$__associate$1$$anon$2:
Connection refused: host1/xxx.xx.xx.xx:45542
---------------
Thanks!
Irina
------------------------------__------------------------------__---------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.__org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org