Hello Community, I'm a student and new to Apache Flink. I'm trying to learn and have setup a 2- node standalone Flink(0.10.1) cluster (one master and one worker). I'm facing the following issue.
Cluster: consists of 2 vms (one master and one worker) The configurations are done as per https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html When I start the cluster both the JobManager and the TaskManager are started on the master and worker respectively. Command to start the cluster : bin/start-cluster.sh JPS shows all the processes running. Then I run the following command to run a WordCount example job: ./bin/flink run ./examples/WordCount.jar the result is attached with the mail. The error is org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException: Not enough free slots available to run to run the job ....................... Resources available to scheduler: Number of instances=0, total number of slots= 0, available slots=0 Therefore I suppose that the JobManager does not find the TaskManager and checked the logs of the TaskManager which indeed shows that the TaskManager is unable to register at the JobManager for quite a long time. There are org.apache.flink.runtime.net.ConnectionUtils: Failed to connect from localhost: Connect timed out and org.apache.flink.runtime.net.ConnectionUtils: Failed to connect from address localhost: Network is Unreachable messages in the log of the TaskManager. Later when it starts up after a number of attempts and tries to register at the JobManager, which also fails after a lot of attempts showing the following message org.apache.flink.runtime.taskmanager.Taskmanager: Trying to register at JobManager akka.tcp://flink@master:6123/user'/jobmanager (attempt:92, timeout:30seconds) and org.apache.flink.runtime.taskmanager.Taskmanager: Tried to associate with unreachable remote host [akka.tcp://flink@master:6123/user/jobmanager]. Address is now gated for 5000ms, all messages to this address will be delivered to dead letters. Reason: Connection timed out: /master:6123 I browsed the internet for these and found http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb <http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb> and https://issues.apache.org/jira/browse/FLINK-1119 these links helpful. Stephan Ewen the guy who provided the solution in both the links gives a good explanation that the TaskManagers take quite some time to register at the JobManager and therefore I waited for as long as 20 mins after starting the cluster to run the job. But even after waiting so long I get the same error. Another suggestion was to run the cluster in streaming mode. So I tried it with the command : bin/start-cluster-streaming.sh and ran the job but I get the same error. I have rechecked all the configurations but I'm unable to find out the fault. I re-checked all the configurations but could not find anything wrong. Also checked the port 6123 on master which is in LISTEN state and tcp request from worker to master shows SYN_SENT state using netstat -na and lsof -i commands. I opened the webpage on master http://localhost:8081 but it shows nothing and localhost:8080 says connection refused. Kindly help me out as it is very important for me. Let me know if you have any questions. Kind Regards, Ravinder Kaur