But if I'm reading his email correctly he's saying that:

1. The master and slave are on the same box (so network hiccups are
unlikely culprit)
2. The failures are intermittent -- i.e program works for a while then
worker gets disassociated...

Is it possible that the master restarted? We used to have problems like
this where we'd restart the master process, it won't be listening on 7077
for some time, but the worker process is trying to connect and by the time
the master is up the worker has given up...


On Wed, May 20, 2015 at 5:16 AM, Evo Eftimov <evo.efti...@isecc.com> wrote:

> Check whether the name can be resolved in the /etc/hosts file (or DNS) of
> the worker
>
>
>
> (the same btw applies for the Node where you run the driver app – all
> other nodes must be able to resolve its name)
>
>
>
> *From:* Stephen Boesch [mailto:java...@gmail.com]
> *Sent:* Wednesday, May 20, 2015 10:07 AM
> *To:* user
> *Subject:* Intermittent difficulties for Worker to contact Master on same
> machine in standalone
>
>
>
>
>
> What conditions would cause the following delays / failure for a
> standalone machine/cluster to have the Worker contact the Master?
>
>
>
> 15/05/20 02:02:53 INFO WorkerWebUI: Started WorkerWebUI at
> http://10.0.0.3:8081
>
> 15/05/20 02:02:53 INFO Worker: Connecting to master
> akka.tcp://sparkMaster@mellyrn.local:7077/user/Master...
>
> 15/05/20 02:02:53 WARN Remoting: Tried to associate with unreachable
> remote address [akka.tcp://sparkMaster@mellyrn.local:7077]. Address is
> now gated for 5000 ms, all messages to this address will be delivered to
> dead letters. Reason: Connection refused: mellyrn.local/10.0.0.3:7077
>
> 15/05/20 02:03:04 INFO Worker: Retrying connection to master (attempt # 1)
>
> ..
>
> ..
>
> 15/05/20 02:03:26 INFO Worker: Retrying connection to master (attempt # 3)
>
> 15/05/20 02:03:26 INFO Worker: Connecting to master
> akka.tcp://sparkMaster@mellyrn.local:7077/user/Master...
>
> 15/05/20 02:03:26 WARN Remoting: Tried to associate with unreachable
> remote address [akka.tcp://sparkMaster@mellyrn.local:7077]. Address is
> now gated for 5000 ms, all messages to this address will be delivered to
> dead letters. Reason: Connection refused: mellyrn.local/10.0.0.3:7077
>

Reply via email to