Fwd: TaskManager unable to register with JobManager

Ravinder Kaur Wed, 03 Feb 2016 08:44:12 -0800

Hello Community,

I'm a student and new to Apache Flink. I'm trying to learn and have setup a
2- node standalone Flink(0.10.1) cluster (one master and one worker). I'm
facing the following issue.

Cluster: consists of 2 vms (one master and one worker)

The configurations are done as per
https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html

When I start the cluster both the JobManager and the TaskManager are
started on the master and worker respectively.

Command to start the cluster : bin/start-cluster.sh

JPS shows all the processes running.

Then I run the following command to run a WordCount example job: ./bin/flink
run ./examples/WordCount.jar

the result is attached with the mail.

The error is
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException:
Not enough free slots available to run to run the job
....................... Resources available to scheduler: Number of
instances=0, total number of slots= 0, available slots=0

Therefore I suppose that the JobManager does not find the TaskManager and
checked the logs of the TaskManager which indeed shows that the TaskManager
is unable to register at the JobManager for quite a long time. There
are org.apache.flink.runtime.net.ConnectionUtils:
Failed to connect from localhost: Connect timed out and
org.apache.flink.runtime.net.ConnectionUtils:
Failed to connect from address localhost: Network is Unreachable messages
in the log of the TaskManager. Later when it starts up after a number of
attempts and tries to register at the JobManager, which also fails after a
lot of attempts showing the following message
org.apache.flink.runtime.taskmanager.Taskmanager:
Trying to register at JobManager akka.tcp://flink@master:6123/user'/jobmanager
(attempt:92, timeout:30seconds) and
org.apache.flink.runtime.taskmanager.Taskmanager:
Tried to associate with unreachable remote host
[akka.tcp://flink@master:6123/user/jobmanager].
Address is now gated for 5000ms, all messages to this address will be
delivered to dead letters. Reason: Connection timed out: /master:6123

I browsed the internet for these and found

http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb
<http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb>
and https://issues.apache.org/jira/browse/FLINK-1119 these links helpful.
Stephan Ewen the guy who provided the solution in both the links gives a
good explanation that the TaskManagers take quite some time to register at
the JobManager and therefore I waited for as long as 20 mins after starting
the cluster to run the job. But even after waiting so long I get the same
error.

Another suggestion was to run the cluster in streaming mode. So I tried it
with the command : bin/start-cluster-streaming.sh and ran the job but I get
the same error. I have rechecked all the configurations but I'm unable to
find out the fault.

I re-checked all the configurations but could not find anything wrong. Also
checked the port 6123 on master which is in LISTEN state and tcp request
from worker to master shows SYN_SENT state using netstat -na and lsof -i
commands.

I opened the webpage on master http://localhost:8081 but it shows nothing
and localhost:8080 says connection refused.

Kindly help me out as it is very important for me. Let me know if you have
any questions.

Kind Regards,
Ravinder Kaur

Fwd: TaskManager unable to register with JobManager

Reply via email to