Hi Ravinder, please have a look at the configuration documentation:
--> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-amp-taskmanager Best, Fabian 2016-02-10 13:55 GMT+01:00 Ravinder Kaur <neetu0...@gmail.com>: > Hello All, > > I need to know the range of ports that are being used during the > master/slave communication in the Flink cluster. Also is there a way I can > specify a range of ports, at the slaves, to restrict them to connect to > master only in this range? > > Kind Regards, > Ravinder Kaur > > > On Wed, Feb 3, 2016 at 10:09 PM, Stephan Ewen <se...@apache.org> wrote: > >> Can machines connect to port 6123? The firewall may block that port, put >> permit SSH. >> >> On Wed, Feb 3, 2016 at 9:52 PM, Ravinder Kaur <neetu0...@gmail.com> >> wrote: >> >>> Hello, >>> >>> Here is the log file of Jobmanager. I did not see some thing suspicious >>> and as it suggests the ports are also listening. >>> >>> 20:58:46,906 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Starting JobManager on IP-of-master:6123 with execution mode >>> CLUSTER and streaming mode BATCH_ONLY >>> 20:58:46,978 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Security is not enabled. Starting non-authenticated JobManager. >>> 20:58:46,979 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Starting JobManager >>> 20:58:46,980 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Starting JobManager actor system at 10.155.208.138:6123 >>> 20:58:48,196 INFO akka.event.slf4j.Slf4jLogger >>> - Slf4jLogger started >>> 20:58:48,295 INFO Remoting >>> - Starting remoting >>> 20:58:48,541 INFO Remoting >>> - Remoting started; listening on addresses >>> :[akka.tcp://flink@IP-of-master:6123] >>> 20:58:48,549 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Starting JobManger web frontend >>> 20:58:48,690 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor >>> - Using directory >>> /tmp/flink-web-876a4755-4f38-4ff7-8202-f263afa9b986 for the web interface >>> files >>> 20:58:48,691 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor >>> - Serving job manager log from >>> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.log >>> 20:58:48,691 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor >>> - Serving job manager stdout from >>> /home/flink/flink-0.10.1/log/flink-flink-jobmanager-0-hostname.out >>> 20:58:49,044 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor >>> - Web frontend listening at 0:0:0:0:0:0:0:0:8081 >>> 20:58:49,045 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Starting JobManager actor >>> 20:58:49,052 INFO org.apache.flink.runtime.blob.BlobServer >>> - Created BLOB server storage directory >>> /tmp/blobStore-e0c52bfb-2411-4a83-ac8d-5664a5894258 >>> 20:58:49,054 INFO org.apache.flink.runtime.blob.BlobServer >>> - Started BLOB server at 0.0.0.0:43683 - max concurrent >>> requests: 50 - max backlog: 1000 >>> 20:58:49,075 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist >>> - Started memory archivist akka://flink/user/archive >>> 20:58:49,075 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Starting JobManager at akka.tcp://flink@IP-of-master >>> :6123/user/jobmanager. >>> 20:58:49,081 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - JobManager akka.tcp://flink@IP-of-master:6123/user/jobmanager >>> was granted leadership with leader session ID None. >>> 20:58:49,082 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor >>> - Starting with JobManager >>> akka.tcp://flink@IP-of-master:6123/user/jobmanager >>> on port 8081 >>> 20:58:49,083 INFO >>> org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader >>> reachable under akka.tcp://flink@IP-of-master:6123/user/jobmanager:null. >>> 20:59:22,794 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Submitting job 72733d69588678ec224003ab5577cab8 (Flink Java Job >>> at Wed Feb 03 20:59:22 CET 2016). >>> 20:59:22,853 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Scheduling job 72733d69588678ec224003ab5577cab8 (Flink Java Job >>> at Wed Feb 03 20:59:22 CET 2016). >>> 20:59:22,857 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job >>> at Wed Feb 03 20:59:22 CET 2016) changed to RUNNING. >>> 20:59:22,859 INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN >>> DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) >>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>> at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>> (1/1) (23fb37019a504fd6c7bf95e46a8cd7a3) switched from CREATED to SCHEDULED >>> 20:59:22,881 INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN >>> DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) >>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>> at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>> (1/1) (23fb37019a504fd6c7bf95e46a8cd7a3) switched from SCHEDULED to CANCELED >>> 20:59:22,881 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job >>> at Wed Feb 03 20:59:22 CET 2016) changed to FAILING. >>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: >>> Not enough free slots available to run the job. You can decrease the >>> operator parallelism or increase the number of slots per TaskManager in the >>> configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at >>> getDefaultTextLineDataSet(WordCountData.java:70) >>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>> at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>> (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < >>> 31e497f2f68c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup >>> [f9ed1aab933e061a8ce1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce, >>> 31e497f2f68c9cee5864c8fddaff3d59] >. Resources available to scheduler: >>> Number of instances=0, total number of slots=0, available slots=0 >>> at >>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256) >>> at >>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131) >>> at >>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:298) >>> at >>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:458) >>> at >>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:322) >>> at >>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:679) >>> at >>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982) >>> at >>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>> at >>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>> at >>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) >>> at >>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) >>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) >>> at >>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) >>> at >>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at >>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>> at >>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>> at >>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>> 20:59:22,886 INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN >>> Reduce (SUM(1), at main(WordCount.java:72) -> FlatMap (collect()) (1/1) >>> (824b6e3771304cd0f92aea4ab763a11d) switched from CREATED to CANCELED >>> 20:59:22,887 INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSink >>> (collect() sink) (1/1) (1bb64a2edc6f68ad716acd9f8d2d7d67) switched from >>> CREATED to CANCELED >>> 20:59:22,890 INFO org.apache.flink.runtime.jobmanager.JobManager >>> - Status of job 72733d69588678ec224003ab5577cab8 (Flink Java Job >>> at Wed Feb 03 20:59:22 CET 2016) changed to FAILED. >>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: >>> Not enough free slots available to run the job. You can decrease the >>> operator parallelism or increase the number of slots per TaskManager in the >>> configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at >>> getDefaultTextLineDataSet(WordCountData.java:70) >>> (org.apache.flink.api.java.io.CollectionInputFormat)) -> FlatMap (FlatMap >>> at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72) >>> (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < >>> 31e497f2f68c9cee5864c8fddaff3d59 > in sharing group < SlotSharingGroup >>> [f9ed1aab933e061a8ce1ecaa3534f18c, 037bb78a1902f7edea69a978ad7b54ce, >>> 31e497f2f68c9cee5864c8fddaff3d59] >. Resources available to scheduler: >>> Number of instances=0, total number of slots=0, available slots=0 >>> at >>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:256) >>> at >>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131) >>> at >>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:298) >>> at >>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:458) >>> at >>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:322) >>> at >>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:679) >>> at >>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982) >>> at >>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>> at >>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>> at >>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) >>> at >>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) >>> >>> >>> On Wed, Feb 3, 2016 at 9:27 PM, Robert Metzger <rmetz...@apache.org> >>> wrote: >>> >>>> Hi, >>>> >>>> the TaskManager is starting up, but its not able to register at the job >>>> manager. Did you check the JobManager log? Do you see anything suspicious >>>> there? Are the ports matching? >>>> >>>> >>>> On Wed, Feb 3, 2016 at 9:23 PM, Ravinder Kaur <neetu0...@gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> Thank you for pointing it out. I had a little typo while I edited the >>>>> hostname in flink-conf.yaml. I've reset it and the TaskManager started up. >>>>> But I still can't run the WordCount example and it throws the same >>>>> NoResourceAvaliableException. >>>>> >>>>> Caused by: >>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableExce >>>>> >>>>> ption: Not enough free slots available to run the job. You can >>>>> decrease the oper >>>>> ator parallelism or increase the number of >>>>> slots per TaskManager in the configur >>>>> ation. Task to schedule: >>>>> < >>>>> Attempt #0 (CHAIN DataSource (at getDefaultTextLineDa >>>>> >>>>> taSet(WordCountData.java:70) >>>>> (org.apache.flink.api.java.io.CollectionInputFormat >>>>> )) -> >>>>> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(Wo >>>>> >>>>> rdCount.java:72) (1/1)) @ (unassigned) - [SCHEDULED] > with >>>>> groupID < 31e497f2f6 >>>>> 8c9cee5864c8fddaff3d59 > in sharing group >>>>> < SlotSharingGroup [f9ed1aab933e061a8c >>>>> e1ecaa3534f18c, >>>>> 037bb78a1902f7edea69a978ad7b54ce, 31e497f2f68c9cee5864c8fddaff3d >>>>> >>>>> 59] >. Resources available to scheduler: Number of instances=0, total >>>>> number of >>>>> slots=0, available slots=0 >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask( >>>>> >>>>> Scheduler.java:256) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmed >>>>> >>>>> iately(Scheduler.java:131) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecutio >>>>> >>>>> n(Execution.java:298) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForEx >>>>> >>>>> ecution(ExecutionVertex.java:458) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAl >>>>> >>>>> l(ExecutionJobVertex.java:322) >>>>> at >>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExe >>>>> >>>>> cution(ExecutionGraph.java:679) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl >>>>> >>>>> >>>>> >>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:982 >>>>> >>>>> ) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl >>>>> >>>>> >>>>> >>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$fl >>>>> >>>>> >>>>> >>>>> ink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:962) >>>>> ... 8 more >>>>> >>>>> The log of TaskManager again has the same errors as before. >>>>> >>>>> 20:58:58,457 INFO org.apache.flink.runtime.net.ConnectionUtils >>>>> - Failed to connect from address '/slave-IP': connect timed out >>>>> 20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils >>>>> - Failed to connect from address '/0:0:0:0:0:0:0:1%1': Network >>>>> is unreachable >>>>> 20:58:58,458 INFO org.apache.flink.runtime.net.ConnectionUtils >>>>> - Failed to connect from address '/127.0.0.1': Invalid >>>>> argument >>>>> 20:58:59,048 WARN org.apache.flink.runtime.net.ConnectionUtils >>>>> - Could not connect to /master-IP:6123. Selecting a local >>>>> address using heuristics. >>>>> 20:58:59,050 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - TaskManager will use hostname/address 'hostname-of-slave' >>>>> (slave-IP) for communication. >>>>> 20:58:59,051 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Starting TaskManager in streaming mode BATCH_ONLY >>>>> 20:58:59,052 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Starting TaskManager actor system at slave_IP:0 >>>>> 20:58:59,776 INFO akka.event.slf4j.Slf4jLogger >>>>> - Slf4jLogger started >>>>> 20:58:59,842 INFO Remoting >>>>> - Starting remoting >>>>> 20:59:00,094 INFO Remoting >>>>> - Remoting started; listening on addresses >>>>> :[akka.tcp://flink@slave-IP:33813] >>>>> 20:59:00,100 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Starting TaskManager actor >>>>> 20:59:00,125 INFO >>>>> org.apache.flink.runtime.io.network.netty.NettyConfig - >>>>> NettyConfig [server address: hostname-of-master/master-IP, server port: >>>>> 49030, memory segment size (bytes): 32768, transport type: NIO, number of >>>>> server threads: 0 (use Netty's default), number of client threads: 0 (use >>>>> Netty's default), server connect backlog: 0 (use Netty's default), client >>>>> connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use >>>>> Netty's default)] >>>>> 20:59:00,131 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Messages between TaskManager and JobManager have a max >>>>> timeout >>>>> of 100000 milliseconds >>>>> 20:59:00,142 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Temporary file directory '/tmp': total 4 GB, usable 1 GB >>>>> (25.00% usable) >>>>> 20:59:00,210 INFO >>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated >>>>> 64 MB for network buffer pool (number of memory segments: 2048, bytes per >>>>> segment: 32768). >>>>> 20:59:00,323 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Using 0.7 of the currently free heap space for Flink managed >>>>> heap memory (293 MB). >>>>> 20:59:00,565 INFO >>>>> org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O >>>>> manager uses directory /tmp/flink-io-c7796b82-6676-4604-97fd-df09001a84e8 >>>>> for spill files. >>>>> 20:59:00,578 INFO org.apache.flink.runtime.filecache.FileCache >>>>> - User file cache uses directory >>>>> /tmp/flink-dist-cache-13ed3e76-cf1e-46fa-9ba2-5177e801429e >>>>> 20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Starting TaskManager actor at >>>>> akka://flink/user/taskmanager#-157676733. >>>>> 20:59:00,908 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - TaskManager data connection information: hostname-of-master >>>>> (dataPort=49030) >>>>> 20:59:00,909 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - TaskManager has 1 task slot(s). >>>>> 20:59:00,910 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Memory usage stats: [HEAP: 376/491/491 MB, NON HEAP: >>>>> 24/49/304 >>>>> MB (used/committed/max)] >>>>> 20:59:00,917 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Trying to register at JobManager >>>>> akka.tcp://flink@master-IP:6123/user/jobmanager >>>>> (attempt 1, timeout: 500 milliseconds) >>>>> 20:59:01,443 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Trying to register at JobManager >>>>> akka.tcp://flink@master-IP:6123/user/jobmanager >>>>> (attempt 2, timeout: 1000 milliseconds) >>>>> 20:59:02,873 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Trying to register at JobManager >>>>> akka.tcp://flink@master-IP:6123/user/jobmanager >>>>> (attempt 3, timeout: 2000 milliseconds) >>>>> 20:59:04,893 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Trying to register at JobManager >>>>> akka.tcp://flink@master-IP:6123/user/jobmanager >>>>> (attempt 4, timeout: 4000 milliseconds) >>>>> 20:59:08,914 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> - Trying to register at JobManager >>>>> akka.tcp://flink@master-IP:6123/user/jobmanager >>>>> (attempt 5, timeout: 8000 milliseconds) >>>>> >>>>> >>>>> Kind Regards, >>>>> Ravinder Kaur >>>>> >>>>> On Wed, Feb 3, 2016 at 8:12 PM, Stephan Ewen <se...@apache.org> wrote: >>>>> >>>>>> This looks like the reason: >>>>>> >>>>>> java.net.UnknownHostException: Cannot resolve the JobManager hostname >>>>>> 'hostname-of-master' specified in the configuration >>>>>> >>>>>> On Wed, Feb 3, 2016 at 7:29 PM, Ravinder Kaur <neetu0...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> The log file of the Taskmanager now shows the following >>>>>>> >>>>>>> 18:27:10,082 WARN org.apache.hadoop.util.NativeCodeLoader >>>>>>> - Unable to load native-hadoop library for your platform... >>>>>>> using builtin-java classes where applicable >>>>>>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - >>>>>>> -------------------------------------------------------------------------------- >>>>>>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Starting TaskManager (Version: 0.10.1, Rev:2e9b231, >>>>>>> Date:22.11.2015 @ 12:41:12 CET) >>>>>>> 18:27:10,244 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Current user: flink >>>>>>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - >>>>>>> 1.7/24.91-b01 >>>>>>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Maximum heap size: 491 MiBytes >>>>>>> 18:27:10,245 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - JAVA_HOME: /usr/lib/jvm/java-1.7.0-openjdk-amd64 >>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Hadoop version: 2.7.0 >>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - JVM Options: >>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - -Xms512M >>>>>>> 18:27:10,247 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - -Xmx512M >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - -XX:MaxDirectMemorySize=8388607T >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - -XX:MaxPermSize=256m >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - >>>>>>> -Dlog.file=/home/flink/flink-0.10.1/log/flink-flink-taskmanager-0-vm-10-155-208-137.cloud.mwn.de.log >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - >>>>>>> -Dlog4j.configuration=file:/home/flink/flink-0.10.1/conf/log4j.properties >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - >>>>>>> -Dlogback.configurationFile=file:/home/flink/flink-0.10.1/conf/logback.xml >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Program Arguments: >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - --configDir >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - /home/flink/flink-0.10.1/conf >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - --streamingMode >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - batch >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Classpath: >>>>>>> /home/flink/flink-0.10.1/lib/flink-dist_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/flink-python_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/log4j-1.2.17.jar:/home/flink/flink-0.10.1/lib/slf4j-log4j12-1.7.7.jar:/usr/lib/jvm/java-1.7.0-openjdk-amd64/lib/tools.jar:: >>>>>>> 18:27:10,248 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - >>>>>>> -------------------------------------------------------------------------------- >>>>>>> 18:27:10,252 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Maximum number of open file descriptors is 4096 >>>>>>> 18:27:10,277 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Loading configuration from /home/flink/flink-0.10.1/conf >>>>>>> 18:27:10,356 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Security is not enabled. Starting non-authenticated >>>>>>> TaskManager. >>>>>>> 18:27:10,365 ERROR org.apache.flink.runtime.taskmanager.TaskManager >>>>>>> - Failed to run TaskManager. >>>>>>> java.net.UnknownHostException: Cannot resolve the JobManager >>>>>>> hostname 'hostname-of-master' specified in the configuration >>>>>>> at >>>>>>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:79) >>>>>>> at >>>>>>> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:48) >>>>>>> at >>>>>>> org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:69) >>>>>>> at >>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndPort(TaskManager.scala:1351) >>>>>>> at >>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1328) >>>>>>> at >>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1240) >>>>>>> at >>>>>>> org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala) >>>>>>> >>>>>>> Kind Regards, >>>>>>> Ravinder Kaur >>>>>>> >>>>>>> On Wed, Feb 3, 2016 at 7:19 PM, Stephan Ewen <se...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> What do the TaskManger logs say? >>>>>>>> >>>>>>>> On Wed, Feb 3, 2016 at 6:34 PM, Ravinder Kaur <neetu0...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Thanks for the quick reply. I tried to set jobmanager.rpc.address >>>>>>>>> in flink-conf.yaml to the hostname of master node on both the nodes. >>>>>>>>> >>>>>>>>> Now it does not start the Taskmanager at the worker node at all. >>>>>>>>> When I start the cluster using ./bin/start-cluster.sh on master it >>>>>>>>> shows >>>>>>>>> the normal output of starting the Jobmanager and Taskmanager but when >>>>>>>>> I run >>>>>>>>> jps on the nodes the slave does not have the Taskmanager running. >>>>>>>>> >>>>>>>>> Running the WordCount example again fails showing the same error. >>>>>>>>> Stopping the cluster says no taskmanager to stop. >>>>>>>>> >>>>>>>>> Kind Regards, >>>>>>>>> Ravinder Kaur >>>>>>>>> >>>>>>>>> On Wed, Feb 3, 2016 at 5:47 PM, Stephan Ewen <se...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Looks like the network configuration is not correct. >>>>>>>>>> >>>>>>>>>> I would try setting the full host name (like "master.abc.xyz.com") >>>>>>>>>> as jobmanager.rpc.address. >>>>>>>>>> >>>>>>>>>> Greetings, >>>>>>>>>> Stephan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur < >>>>>>>>>> neetu0...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello Community, >>>>>>>>>>> >>>>>>>>>>> I'm a student and new to Apache Flink. I'm trying to learn and >>>>>>>>>>> have setup a 2- node standalone Flink(0.10.1) cluster (one master >>>>>>>>>>> and one >>>>>>>>>>> worker). I'm facing the following issue. >>>>>>>>>>> >>>>>>>>>>> Cluster: consists of 2 vms (one master and one worker) >>>>>>>>>>> >>>>>>>>>>> The configurations are done as per >>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html >>>>>>>>>>> >>>>>>>>>>> When I start the cluster both the JobManager and the TaskManager >>>>>>>>>>> are started on the master and worker respectively. >>>>>>>>>>> >>>>>>>>>>> Command to start the cluster : bin/start-cluster.sh >>>>>>>>>>> >>>>>>>>>>> JPS shows all the processes running. >>>>>>>>>>> >>>>>>>>>>> Then I run the following command to run a WordCount example job: >>>>>>>>>>> ./bin/flink >>>>>>>>>>> run ./examples/WordCount.jar >>>>>>>>>>> >>>>>>>>>>> the result is attached with the mail. >>>>>>>>>>> >>>>>>>>>>> The error is >>>>>>>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException: >>>>>>>>>>> Not enough free slots available to run to run the job >>>>>>>>>>> ....................... Resources available to scheduler: Number of >>>>>>>>>>> instances=0, total number of slots= 0, available slots=0 >>>>>>>>>>> >>>>>>>>>>> Therefore I suppose that the JobManager does not find the >>>>>>>>>>> TaskManager and checked the logs of the TaskManager which indeed >>>>>>>>>>> shows that >>>>>>>>>>> the TaskManager is unable to register at the JobManager for quite a >>>>>>>>>>> long >>>>>>>>>>> time. There are org.apache.flink.runtime.net.ConnectionUtils: >>>>>>>>>>> Failed to connect from localhost: Connect timed out and >>>>>>>>>>> org.apache.flink.runtime.net.ConnectionUtils: >>>>>>>>>>> Failed to connect from address localhost: Network is >>>>>>>>>>> Unreachable messages in the log of the TaskManager. Later when >>>>>>>>>>> it starts up after a number of attempts and tries to register at the >>>>>>>>>>> JobManager, which also fails after a lot of attempts showing the >>>>>>>>>>> following >>>>>>>>>>> message org.apache.flink.runtime.taskmanager.Taskmanager: >>>>>>>>>>> Trying to register at JobManager >>>>>>>>>>> akka.tcp://flink@master:6123/user'/jobmanager >>>>>>>>>>> (attempt:92, timeout:30seconds) and >>>>>>>>>>> org.apache.flink.runtime.taskmanager.Taskmanager: >>>>>>>>>>> Tried to associate with unreachable remote host >>>>>>>>>>> [akka.tcp://flink@master:6123/user/jobmanager]. >>>>>>>>>>> Address is now gated for 5000ms, all messages to this address will >>>>>>>>>>> be >>>>>>>>>>> delivered to dead letters. Reason: Connection timed out: >>>>>>>>>>> /master:6123 >>>>>>>>>>> >>>>>>>>>>> I browsed the internet for these and found >>>>>>>>>>> >>>>>>>>>>> http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb >>>>>>>>>>> <http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb> >>>>>>>>>>> and https://issues.apache.org/jira/browse/FLINK-1119 these >>>>>>>>>>> links helpful. Stephan Ewen the guy who provided the solution in >>>>>>>>>>> both the >>>>>>>>>>> links gives a good explanation that the TaskManagers take quite >>>>>>>>>>> some time >>>>>>>>>>> to register at the JobManager and therefore I waited for as long as >>>>>>>>>>> 20 mins >>>>>>>>>>> after starting the cluster to run the job. But even after waiting >>>>>>>>>>> so long I >>>>>>>>>>> get the same error. >>>>>>>>>>> >>>>>>>>>>> Another suggestion was to run the cluster in streaming mode. So >>>>>>>>>>> I tried it with the command : bin/start-cluster-streaming.sh and >>>>>>>>>>> ran the job but I get the same error. I have rechecked all the >>>>>>>>>>> configurations but I'm unable to find out the fault. >>>>>>>>>>> >>>>>>>>>>> I re-checked all the configurations but could not find anything >>>>>>>>>>> wrong. Also checked the port 6123 on master which is in LISTEN >>>>>>>>>>> state and >>>>>>>>>>> tcp request from worker to master shows SYN_SENT state using >>>>>>>>>>> netstat -na >>>>>>>>>>> and lsof -i commands. >>>>>>>>>>> >>>>>>>>>>> I opened the webpage on master http://localhost:8081 but it >>>>>>>>>>> shows nothing and localhost:8080 says connection refused. >>>>>>>>>>> >>>>>>>>>>> Kindly help me out as it is very important for me. Let me know >>>>>>>>>>> if you have any questions. >>>>>>>>>>> >>>>>>>>>>> Kind Regards, >>>>>>>>>>> Ravinder Kaur >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >