Hi, the TaskManager is starting up, but its not able to register at the job manager. Did you check the JobManager log? Do you see anything suspicious there? Are the ports matching?
On Wed, Feb 3, 2016 at 9:23 PM, Ravinder Kaur <neetu0...@gmail.com> wrote: > Hello, > > Thank you for pointing it out. I had a little typo while I edited the > hostname in flink-conf.yaml. I've reset it and the TaskManager started up. > But I still can't run the WordCount example and it throws the same > NoResourceAvaliableException. > > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableExce > > ption: Not enough free slots available to run the job. You can > decrease the oper > ator parallelism or increase the number of > slots per TaskManager in the configur > ation. Task to schedule: < > Attempt #0 (CHAIN DataSource (at getDefaultTextLineDa > > taSet(WordCountData.java:70) > (org.apache.flink.api.java.io.CollectionInputFormat > )) -> > FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(Wo > > rdCount.java:72) (1/1)) @ (unassigned) - [SCHEDULED] > with > groupID < 31e497f2f6 > 8c9cee5864c8fddaff3d59 > in sharing group > < SlotSharingGroup [f9ed1aab933e061a8c > e1ecaa3534f18c, > 037bb78a1902f7edea69a978ad7b54ce, 31e497f2f68c9cee5864c8fddaff3d > > 59] >. Resources available to scheduler: Number of instances=0, total > number of > slots=0, available slots=0 > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask( > > Scheduler.java:256) > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmed > > iately(Scheduler.java:131) > at > org.apache.flink.runtime.executiongraph.Execution.scheduleForExecutio > > n(Execution.java:298) > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForEx > > ecution(ExecutionVertex.java:458) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAl > > l(ExecutionJobVertex.java:322) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExe > > cution(ExecutionGraph.java:679) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl > > > > ink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982 > > ) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl > > > ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl > > > ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) > ... 8 more > > The log of TaskManager again has the same errors as before. > > 20:58:58,457 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/slave-IP': connect timed out > 20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network is > unreachable > 20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils > - Failed to connect from address '/127.0.0.1': Invalid argument > 20:58:59,048 WARN org.apache.flink.runtime.net.ConnectionUtils > - Could not connect to /master-IP:6123. Selecting a local address > using heuristics. > 20:58:59,050 INFO org.apache.flink.runtime.taskmanager.TaskManager > - TaskManager will use hostname/address 'hostname-of-slave' > (slave-IP) for communication. > 20:58:59,051 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager in streaming mode BATCH_ONLY > 20:58:59,052 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager actor system at slave_IP:0 > 20:58:59,776 INFO akka.event.slf4j.Slf4jLogger > - Slf4jLogger started > 20:58:59,842 INFO Remoting > - Starting remoting > 20:59:00,094 INFO Remoting > - Remoting started; listening on addresses > :[akka.tcp://flink@slave-IP:33813] > 20:59:00,100 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager actor > 20:59:00,125 INFO org.apache.flink.runtime.io.network.netty.NettyConfig > - NettyConfig [server address: hostname-of-master/master-IP, server > port: 49030, memory segment size (bytes): 32768, transport type: NIO, > number of server threads: 0 (use Netty's default), number of client > threads: 0 (use Netty's default), server connect backlog: 0 (use Netty's > default), client connect timeout (sec): 120, send/receive buffer size > (bytes): 0 (use Netty's default)] > 20:59:00,131 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Messages between TaskManager and JobManager have a max timeout of > 100000 milliseconds > 20:59:00,142 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Temporary file directory '/tmp': total 4 GB, usable 1 GB (25.00% > usable) > 20:59:00,210 INFO > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated > 64 MB for network buffer pool (number of memory segments: 2048, bytes per > segment: 32768). > 20:59:00,323 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Using 0.7 of the currently free heap space for Flink managed heap > memory (293 MB). > 20:59:00,565 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager > - I/O manager uses directory > /tmp/flink-io-c7796b82-6676-4604-97fd-df09001a84e8 for spill files. > 20:59:00,578 INFO org.apache.flink.runtime.filecache.FileCache > - User file cache uses directory > /tmp/flink-dist-cache-13ed3e76-cf1e-46fa-9ba2-5177e801429e > 20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager actor at > akka://flink/user/taskmanager#-157676733. > 20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager > - TaskManager data connection information: hostname-of-master > (dataPort=49030) > 20:59:00,909 INFO org.apache.flink.runtime.taskmanager.TaskManager > - TaskManager has 1 task slot(s). > 20:59:00,910 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Memory usage stats: [HEAP: 376/491/491 MB, NON HEAP: 24/49/304 MB > (used/committed/max)] > 20:59:00,917 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@master-IP:6123/user/jobmanager > (attempt 1, timeout: 500 milliseconds) > 20:59:01,443 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@master-IP:6123/user/jobmanager > (attempt 2, timeout: 1000 milliseconds) > 20:59:02,873 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@master-IP:6123/user/jobmanager > (attempt 3, timeout: 2000 milliseconds) > 20:59:04,893 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@master-IP:6123/user/jobmanager > (attempt 4, timeout: 4000 milliseconds) > 20:59:08,914 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@master-IP:6123/user/jobmanager > (attempt 5, timeout: 8000 milliseconds) > > > Kind Regards, > Ravinder Kaur > > On Wed, Feb 3, 2016 at 8:12 PM, Stephan Ewen <se...@apache.org> wrote: > >> This looks like the reason: >> >> java.net.UnknownHostException: Cannot resolve the JobManager hostname >> 'hostname-of-master' specified in the configuration >> >> On Wed, Feb 3, 2016 at 7:29 PM, Ravinder Kaur <neetu0...@gmail.com> >> wrote: >> >>> Hello, >>> >>> The log file of the Taskmanager now shows the following >>> >>> 18:27:10,082 WARN org.apache.hadoop.util.NativeCodeLoader >>> - Unable to load native-hadoop library for your platform... using >>> builtin-java classes where applicable >>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - >>> -------------------------------------------------------------------------------- >>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Starting TaskManager (Version: 0.10.1, Rev:2e9b231, >>> Date:22.11.2015 @ 12:41:12 CET) >>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Current user: flink >>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - >>> 1.7/24.91-b01 >>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Maximum heap size: 491 MiBytes >>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - JAVA_HOME: /usr/lib/jvm/java-1.7.0-openjdk-amd64 >>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Hadoop version: 2.7.0 >>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - JVM Options: >>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - -Xms512M >>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - -Xmx512M >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - -XX:MaxDirectMemorySize=8388607T >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - -XX:MaxPermSize=256m >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - >>> -Dlog.file=/home/flink/flink-0.10.1/log/flink-flink-taskmanager-0-vm-10-155-208-137.cloud.mwn.de.log >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - >>> -Dlog4j.configuration=file:/home/flink/flink-0.10.1/conf/log4j.properties >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - >>> -Dlogback.configurationFile=file:/home/flink/flink-0.10.1/conf/logback.xml >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Program Arguments: >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - --configDir >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - /home/flink/flink-0.10.1/conf >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - --streamingMode >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - batch >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Classpath: >>> /home/flink/flink-0.10.1/lib/flink-dist_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/flink-python_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/log4j-1.2.17.jar:/home/flink/flink-0.10.1/lib/slf4j-log4j12-1.7.7.jar:/usr/lib/jvm/java-1.7.0-openjdk-amd64/lib/tools.jar:: >>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - >>> -------------------------------------------------------------------------------- >>> 18:27:10,252 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Maximum number of open file descriptors is 4096 >>> 18:27:10,277 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Loading configuration from /home/flink/flink-0.10.1/conf >>> 18:27:10,356 INFO org.apache.flink.runtime.taskmanager.TaskManager >>> - Security is not enabled. Starting non-authenticated TaskManager. >>> 18:27:10,365 ERROR org.apache.flink.runtime.taskmanager.TaskManager >>> - Failed to run TaskManager. >>> java.net.UnknownHostException: Cannot resolve the JobManager hostname >>> 'hostname-of-master' specified in the configuration >>> at >>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:79) >>> at >>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:48) >>> at >>> org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:69) >>> at >>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndPort(TaskManager.scala:1351) >>> at >>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1328) >>> at >>> org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1240) >>> at >>> org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala) >>> >>> Kind Regards, >>> Ravinder Kaur >>> >>> On Wed, Feb 3, 2016 at 7:19 PM, Stephan Ewen <se...@apache.org> wrote: >>> >>>> What do the TaskManger logs say? >>>> >>>> On Wed, Feb 3, 2016 at 6:34 PM, Ravinder Kaur <neetu0...@gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> Thanks for the quick reply. I tried to set jobmanager.rpc.address in >>>>> flink-conf.yaml to the hostname of master node on both the nodes. >>>>> >>>>> Now it does not start the Taskmanager at the worker node at all. When >>>>> I start the cluster using ./bin/start-cluster.sh on master it shows the >>>>> normal output of starting the Jobmanager and Taskmanager but when I run >>>>> jps >>>>> on the nodes the slave does not have the Taskmanager running. >>>>> >>>>> Running the WordCount example again fails showing the same error. >>>>> Stopping the cluster says no taskmanager to stop. >>>>> >>>>> Kind Regards, >>>>> Ravinder Kaur >>>>> >>>>> On Wed, Feb 3, 2016 at 5:47 PM, Stephan Ewen <se...@apache.org> wrote: >>>>> >>>>>> Looks like the network configuration is not correct. >>>>>> >>>>>> I would try setting the full host name (like "master.abc.xyz.com") >>>>>> as jobmanager.rpc.address. >>>>>> >>>>>> Greetings, >>>>>> Stephan >>>>>> >>>>>> >>>>>> On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur <neetu0...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Hello Community, >>>>>>> >>>>>>> I'm a student and new to Apache Flink. I'm trying to learn and have >>>>>>> setup a 2- node standalone Flink(0.10.1) cluster (one master and one >>>>>>> worker). I'm facing the following issue. >>>>>>> >>>>>>> Cluster: consists of 2 vms (one master and one worker) >>>>>>> >>>>>>> The configurations are done as per >>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html >>>>>>> >>>>>>> When I start the cluster both the JobManager and the TaskManager are >>>>>>> started on the master and worker respectively. >>>>>>> >>>>>>> Command to start the cluster : bin/start-cluster.sh >>>>>>> >>>>>>> JPS shows all the processes running. >>>>>>> >>>>>>> Then I run the following command to run a WordCount example job: >>>>>>> ./bin/flink >>>>>>> run ./examples/WordCount.jar >>>>>>> >>>>>>> the result is attached with the mail. >>>>>>> >>>>>>> The error is >>>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException: >>>>>>> Not enough free slots available to run to run the job >>>>>>> ....................... Resources available to scheduler: Number of >>>>>>> instances=0, total number of slots= 0, available slots=0 >>>>>>> >>>>>>> Therefore I suppose that the JobManager does not find the >>>>>>> TaskManager and checked the logs of the TaskManager which indeed shows >>>>>>> that >>>>>>> the TaskManager is unable to register at the JobManager for quite a long >>>>>>> time. There are org.apache.flink.runtime.net.ConnectionUtils: >>>>>>> Failed to connect from localhost: Connect timed out and >>>>>>> org.apache.flink.runtime.net.ConnectionUtils: >>>>>>> Failed to connect from address localhost: Network is Unreachable >>>>>>> messages >>>>>>> in the log of the TaskManager. Later when it starts up after a number of >>>>>>> attempts and tries to register at the JobManager, which also fails >>>>>>> after a >>>>>>> lot of attempts showing the following message >>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager: >>>>>>> Trying to register at JobManager >>>>>>> akka.tcp://flink@master:6123/user'/jobmanager >>>>>>> (attempt:92, timeout:30seconds) and >>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager: >>>>>>> Tried to associate with unreachable remote host >>>>>>> [akka.tcp://flink@master:6123/user/jobmanager]. >>>>>>> Address is now gated for 5000ms, all messages to this address will be >>>>>>> delivered to dead letters. Reason: Connection timed out: /master:6123 >>>>>>> >>>>>>> I browsed the internet for these and found >>>>>>> >>>>>>> http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb >>>>>>> <http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb> >>>>>>> and https://issues.apache.org/jira/browse/FLINK-1119 these links >>>>>>> helpful. Stephan Ewen the guy who provided the solution in both the >>>>>>> links >>>>>>> gives a good explanation that the TaskManagers take quite some time to >>>>>>> register at the JobManager and therefore I waited for as long as 20 mins >>>>>>> after starting the cluster to run the job. But even after waiting so >>>>>>> long I >>>>>>> get the same error. >>>>>>> >>>>>>> Another suggestion was to run the cluster in streaming mode. So I >>>>>>> tried it with the command : bin/start-cluster-streaming.sh and ran >>>>>>> the job but I get the same error. I have rechecked all the >>>>>>> configurations >>>>>>> but I'm unable to find out the fault. >>>>>>> >>>>>>> I re-checked all the configurations but could not find anything >>>>>>> wrong. Also checked the port 6123 on master which is in LISTEN state and >>>>>>> tcp request from worker to master shows SYN_SENT state using netstat -na >>>>>>> and lsof -i commands. >>>>>>> >>>>>>> I opened the webpage on master http://localhost:8081 but it shows >>>>>>> nothing and localhost:8080 says connection refused. >>>>>>> >>>>>>> Kindly help me out as it is very important for me. Let me know if >>>>>>> you have any questions. >>>>>>> >>>>>>> Kind Regards, >>>>>>> Ravinder Kaur >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >