Re: Connecting the channel failed: Connection refused

2015-06-25 Thread Stephan Ewen
That makes perfect sense, thanks! Am 25.06.2015 21:39 schrieb "Aaron Jackson" : > So the JobManager was running on host1. This also explains why I didn't > see the problem until I had asked for a sizeable degree of parallelism > since it probably never assigned a task to host3. > > Thanks for you

Re: Connecting the channel failed: Connection refused

2015-06-25 Thread Aaron Jackson
So the JobManager was running on host1. This also explains why I didn't see the problem until I had asked for a sizeable degree of parallelism since it probably never assigned a task to host3. Thanks for your help On Thu, Jun 25, 2015 at 3:34 AM, Stephan Ewen wrote: > Nice! > > TaskManagers ne

Re: Connecting the channel failed: Connection refused

2015-06-25 Thread Stephan Ewen
Nice! TaskManagers need to announce where they listen for connections. We do not yet block "localhost" as an acceptable address, to not prohibit local test setups. There are some routines that try to select an interface that can communicate with the outside world. Is host3 running on the same m

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Aaron Jackson
That was it. host3 was showing localhost - looked a little further and it was missing an entry in /etc/hosts. Thanks for looking into this. Aaron On Wed, Jun 24, 2015 at 2:13 PM, Stephan Ewen wrote: > Aaron, > > Can you check how the TaskManagers register at the JobManager? When you > look at

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Stephan Ewen
Aaron, Can you check how the TaskManagers register at the JobManager? When you look at the 'TaskManagers' section in the JobManager's web Interface (at port 8081), what does it say as the TaskManager host names? Does it list "host1", "host2", "host3"...? Thanks, Stephan Am 24.06.2015 20:31 schr

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Ufuk Celebi
On 24 Jun 2015, at 16:22, Aaron Jackson wrote: > Thanks. My setup is actually 3 task managers x 4 slots. I played with the > parallelism and found that at low values, the error did not occur. I can > only conclude that there is some form of data shuffling that is occurring > that is sensiti

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Aaron Jackson
us. > > > > > > I noticed this exception in one of the Travis CI builds, so I'm hoping > it's something obvious I've missed. > > > > > > 06/23/2015 05:03:00 Join (Join at run(Job.java:137))(11/12) > switched to RUNNING > > > 06/23

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Ufuk Celebi
n (Join at run(Job.java:137))(11/12) switched to > > RUNNING > > 06/23/2015 05:03:00 Join (Join at run(Job.java:176))(9/12) switched to > > RUNNING > > 06/23/2015 05:03:00 Join (Join at run(Job.java:176))(12/12) switched to > > R

Re: Connecting the channel failed: Connection refused

2015-06-23 Thread Aaron Jackson
))(9/12) switched > to RUNNING > > 06/23/2015 05:03:00 Join (Join at run(Job.java:176))(12/12) switched > to RUNNING > > 06/23/2015 05:03:00 Join (Join at run(Job.java:137))(12/12) switched > to FAILED > > java.lang.Exception: The data preparation for

Re: Connecting the channel failed: Connection refused

2015-06-22 Thread Ufuk Celebi
oin at run(Job.java:176))(9/12) switched to > RUNNING > 06/23/2015 05:03:00 Join (Join at run(Job.java:176))(12/12) switched to > RUNNING > 06/23/2015 05:03:00 Join (Join at run(Job.java:137))(12/12) switched to > FAILED > java.lang.Exception: The data preparation for ta

Connecting the channel failed: Connection refused

2015-06-22 Thread Aaron Jackson
6/23/2015 05:03:00 Join (Join at run(Job.java:137))(12/12) switched to FAILED java.lang.Exception: The data preparation for task 'Join (Join at run(Job.java:137))' , caused an error: Connecting the channel failed: Connection refused: localhost/127.0.0.1:46229 at org.