Hi Abhi, I have used Flink on EMR via YARN a couple of times without problems. I started a Flink YARN session like this:
./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096 This will start five YARN containers (1 JobManager with 1024MB, 4 Taskmanagers with 4096MB). See more config options in the documentation [1]. In one of the last lines of the std-out output you should find a line that tells you the IP and port of the JobManager. With the IP and port, you can submit a job as follows: ./bin/flink run -m jmIP:jmPort -p 4 jobJarFile.jar <arguments> This will send the job to the JobManager specified by IP and port and execute the program with a parallelism of 4. See more config options in the documentation [2]. If this does not help, could you share the exact command that you use to start the YARN session and submit the job? Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/yarn_setup.html [2] https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/cli.html 2016-03-08 0:25 GMT+01:00 Bajaj, Abhinav <abhinav.ba...@here.com>: > Hi, > > I am a newbie to Flink and trying to use it in AWS. > I have created a YARN cluster on AWS EC2 machines. > Trying to submit Flink job to the remote YARN cluster using the Flink > Client running on my local machine. > > The Jobmanager start successfully on the YARN container but the client is > not able to connect to the Jobmanager. > > Flink Client Logs - > > 13:57:34,877 INFO org.apache.flink.yarn.FlinkYarnClient > - Deploying cluster, current state ACCEPTED > 13:57:35,951 INFO org.apache.flink.yarn.FlinkYarnClient > - Deploying cluster, current state ACCEPTED > 13:57:37,027 INFO org.apache.flink.yarn.FlinkYarnClient > - YARN application has been deployed successfully. > 13:57:37,100 INFO org.apache.flink.yarn.FlinkYarnCluster > - Start actor system. > 13:57:37,532 INFO org.apache.flink.yarn.FlinkYarnCluster > - Start application client. > YARN cluster started > JobManager web interface address > http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8088/proxy/application_1456184947990_0003/ > Waiting until all TaskManagers have connected > 13:57:37,540 INFO org.apache.flink.yarn.ApplicationClient > - Notification about new leader address > akka.tcp://flink@54.35.41.12:41292/user/jobmanager > with session ID null. > No status updates from the YARN cluster received so far. Waiting ... > 13:57:37,543 INFO org.apache.flink.yarn.ApplicationClient > - Received address of new leader > akka.tcp://flink@54.35.41.12:41292/user/jobmanager > with session ID null. > 13:57:37,543 INFO org.apache.flink.yarn.ApplicationClient > - Disconnect from JobManager null. > 13:57:37,545 INFO org.apache.flink.yarn.ApplicationClient > - Trying to register at JobManager akka.tcp://flink@54.35.41.12 > :41292/user/jobmanager. > No status updates from the YARN cluster received so far. Waiting ... > > The logs of the Jobmanager contains the following - > > 21:57:39,142 ERROR akka.remote.EndpointWriter > - dropping message [class akka.actor.ActorSelectionMessage] for non-local > recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at > [akka.tcp://flink@54.35.41.12:41292] inbound addresses are > [akka.tcp://flink@172.31.23.18:41292] > 21:57:40,782 INFO org.apache.flink.runtime.instance.InstanceManager > - Registered TaskManager at ec2-54-35-41-12 > (akka.tcp://flink@172.31.23.18:60565/user/taskmanager) as > 72101dd2ee94caa7a5ec5a75488359aa. Current number of registered hosts is 1. > Current number of alive task slots is 1. > 21:57:41,162 ERROR akka.remote.EndpointWriter > - dropping message [class akka.actor.ActorSelectionMessage] for non-local > recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at > [akka.tcp://flink@54.35.41.12:41292] inbound addresses are > [akka.tcp://flink@172.31.23.18:41292] > > It seems the problem is in the mismatch of the Jobmanager Akka actors > system running address and the one user by the Client. > 172.31.23.18 – is the internal private IP of the EC2 machine where the > Jobmanager container is running. > 54.35.41.12 – is the external IP of the EC2 machine, used by Flink client > to submit the Job. > Because of this mismatch the messages are ignored by the Akka actor System. > > Can someone please help me with this issue. > I can share the detailed logs, if required. > > Thanks, > Abhi > >