Re: TaskManager unable to register with JobManager

Stephan Ewen Wed, 03 Feb 2016 08:48:28 -0800

Looks like the network configuration is not correct.

I would try setting the full host name (like "master.abc.xyz.com") as
jobmanager.rpc.address.


Greetings,
Stephan


On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur <neetu0...@gmail.com> wrote:

>
> Hello Community,
>
> I'm a student and new to Apache Flink. I'm trying to learn and have setup
> a 2- node standalone Flink(0.10.1) cluster (one master and one worker). I'm
> facing the following issue.
>
> Cluster: consists of 2 vms (one master and one worker)
>
> The configurations are done as per
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html
>
> When I start the cluster both the JobManager and the TaskManager are
> started on the master and worker respectively.
>
> Command to start the cluster : bin/start-cluster.sh
>
> JPS shows all the processes running.
>
> Then I run the following command to run a WordCount example job: ./bin/flink
> run ./examples/WordCount.jar
>
> the result is attached with the mail.
>
> The error is
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException:
> Not enough free slots available to run to run the job
> ....................... Resources available to scheduler: Number of
> instances=0, total number of slots= 0, available slots=0
>
> Therefore I suppose that the JobManager does not find the TaskManager and
> checked the logs of the TaskManager which indeed shows that the TaskManager
> is unable to register at the JobManager for quite a long time. There are 
> org.apache.flink.runtime.net.ConnectionUtils:
> Failed to connect from localhost: Connect timed out and 
> org.apache.flink.runtime.net.ConnectionUtils:
> Failed to connect from address localhost: Network is Unreachable messages
> in the log of the TaskManager. Later when it starts up after a number of
> attempts and tries to register at the JobManager, which also fails after a
> lot of attempts showing the following message 
> org.apache.flink.runtime.taskmanager.Taskmanager:
> Trying to register at JobManager akka.tcp://flink@master:6123/user'/jobmanager
> (attempt:92, timeout:30seconds) and 
> org.apache.flink.runtime.taskmanager.Taskmanager:
> Tried to associate with unreachable remote host 
> [akka.tcp://flink@master:6123/user/jobmanager].
> Address is now gated for 5000ms, all messages to this address will be
> delivered to dead letters. Reason: Connection timed out: /master:6123
>
> I browsed the internet for these and found
>  
> http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb
> <http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb>
> and https://issues.apache.org/jira/browse/FLINK-1119 these links helpful.
> Stephan Ewen the guy who provided the solution in both the links gives a
> good explanation that the TaskManagers take quite some time to register at
> the JobManager and therefore I waited for as long as 20 mins after starting
> the cluster to run the job. But even after waiting so long I get the same
> error.
>
> Another suggestion was to run the cluster in streaming mode. So I tried it
> with the command : bin/start-cluster-streaming.sh and ran the job but I
> get the same error. I have rechecked all the configurations but I'm unable
> to find out the fault.
>
> I re-checked all the configurations but could not find anything wrong.
> Also checked the port 6123 on master which is in LISTEN state and tcp
> request from worker to master shows SYN_SENT state using netstat -na and
> lsof -i commands.
>
> I opened the webpage on master http://localhost:8081 but it shows nothing
> and localhost:8080 says connection refused.
>
> Kindly help me out as it is very important for me. Let me know if you have
> any questions.
>
> Kind Regards,
> Ravinder Kaur
>
>

Re: TaskManager unable to register with JobManager

Reply via email to