But if I'm reading his email correctly he's saying that: 1. The master and slave are on the same box (so network hiccups are unlikely culprit) 2. The failures are intermittent -- i.e program works for a while then worker gets disassociated...
Is it possible that the master restarted? We used to have problems like this where we'd restart the master process, it won't be listening on 7077 for some time, but the worker process is trying to connect and by the time the master is up the worker has given up... On Wed, May 20, 2015 at 5:16 AM, Evo Eftimov <evo.efti...@isecc.com> wrote: > Check whether the name can be resolved in the /etc/hosts file (or DNS) of > the worker > > > > (the same btw applies for the Node where you run the driver app – all > other nodes must be able to resolve its name) > > > > *From:* Stephen Boesch [mailto:java...@gmail.com] > *Sent:* Wednesday, May 20, 2015 10:07 AM > *To:* user > *Subject:* Intermittent difficulties for Worker to contact Master on same > machine in standalone > > > > > > What conditions would cause the following delays / failure for a > standalone machine/cluster to have the Worker contact the Master? > > > > 15/05/20 02:02:53 INFO WorkerWebUI: Started WorkerWebUI at > http://10.0.0.3:8081 > > 15/05/20 02:02:53 INFO Worker: Connecting to master > akka.tcp://sparkMaster@mellyrn.local:7077/user/Master... > > 15/05/20 02:02:53 WARN Remoting: Tried to associate with unreachable > remote address [akka.tcp://sparkMaster@mellyrn.local:7077]. Address is > now gated for 5000 ms, all messages to this address will be delivered to > dead letters. Reason: Connection refused: mellyrn.local/10.0.0.3:7077 > > 15/05/20 02:03:04 INFO Worker: Retrying connection to master (attempt # 1) > > .. > > .. > > 15/05/20 02:03:26 INFO Worker: Retrying connection to master (attempt # 3) > > 15/05/20 02:03:26 INFO Worker: Connecting to master > akka.tcp://sparkMaster@mellyrn.local:7077/user/Master... > > 15/05/20 02:03:26 WARN Remoting: Tried to associate with unreachable > remote address [akka.tcp://sparkMaster@mellyrn.local:7077]. Address is > now gated for 5000 ms, all messages to this address will be delivered to > dead letters. Reason: Connection refused: mellyrn.local/10.0.0.3:7077 >