Looks like the network configuration is not correct. I would try setting the full host name (like "master.abc.xyz.com") as jobmanager.rpc.address.
Greetings, Stephan On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur <neetu0...@gmail.com> wrote: > > Hello Community, > > I'm a student and new to Apache Flink. I'm trying to learn and have setup > a 2- node standalone Flink(0.10.1) cluster (one master and one worker). I'm > facing the following issue. > > Cluster: consists of 2 vms (one master and one worker) > > The configurations are done as per > https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html > > When I start the cluster both the JobManager and the TaskManager are > started on the master and worker respectively. > > Command to start the cluster : bin/start-cluster.sh > > JPS shows all the processes running. > > Then I run the following command to run a WordCount example job: ./bin/flink > run ./examples/WordCount.jar > > the result is attached with the mail. > > The error is > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException: > Not enough free slots available to run to run the job > ....................... Resources available to scheduler: Number of > instances=0, total number of slots= 0, available slots=0 > > Therefore I suppose that the JobManager does not find the TaskManager and > checked the logs of the TaskManager which indeed shows that the TaskManager > is unable to register at the JobManager for quite a long time. There are > org.apache.flink.runtime.net.ConnectionUtils: > Failed to connect from localhost: Connect timed out and > org.apache.flink.runtime.net.ConnectionUtils: > Failed to connect from address localhost: Network is Unreachable messages > in the log of the TaskManager. Later when it starts up after a number of > attempts and tries to register at the JobManager, which also fails after a > lot of attempts showing the following message > org.apache.flink.runtime.taskmanager.Taskmanager: > Trying to register at JobManager akka.tcp://flink@master:6123/user'/jobmanager > (attempt:92, timeout:30seconds) and > org.apache.flink.runtime.taskmanager.Taskmanager: > Tried to associate with unreachable remote host > [akka.tcp://flink@master:6123/user/jobmanager]. > Address is now gated for 5000ms, all messages to this address will be > delivered to dead letters. Reason: Connection timed out: /master:6123 > > I browsed the internet for these and found > > http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb > <http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb> > and https://issues.apache.org/jira/browse/FLINK-1119 these links helpful. > Stephan Ewen the guy who provided the solution in both the links gives a > good explanation that the TaskManagers take quite some time to register at > the JobManager and therefore I waited for as long as 20 mins after starting > the cluster to run the job. But even after waiting so long I get the same > error. > > Another suggestion was to run the cluster in streaming mode. So I tried it > with the command : bin/start-cluster-streaming.sh and ran the job but I > get the same error. I have rechecked all the configurations but I'm unable > to find out the fault. > > I re-checked all the configurations but could not find anything wrong. > Also checked the port 6123 on master which is in LISTEN state and tcp > request from worker to master shows SYN_SENT state using netstat -na and > lsof -i commands. > > I opened the webpage on master http://localhost:8081 but it shows nothing > and localhost:8080 says connection refused. > > Kindly help me out as it is very important for me. Let me know if you have > any questions. > > Kind Regards, > Ravinder Kaur > >