Hi Abhi,

I have used Flink on EMR via YARN a couple of times without problems.
I started a Flink YARN session like this:

./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096

This will start five YARN containers (1 JobManager with 1024MB, 4
Taskmanagers with 4096MB). See more config options in the documentation [1].
In one of the last lines of the std-out output you should find a line that
tells you the IP and port of the JobManager.

With the IP and port, you can submit a job as follows:

./bin/flink run -m jmIP:jmPort -p 4 jobJarFile.jar <arguments>

This will send the job to the JobManager specified by IP and port and
execute the program with a parallelism of 4. See more config options in the
documentation [2].

If this does not help, could you share the exact command that you use to
start the YARN session and submit the job?

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/yarn_setup.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/cli.html

2016-03-08 0:25 GMT+01:00 Bajaj, Abhinav <abhinav.ba...@here.com>:

> Hi,
>
> I am a newbie to Flink and trying to use it in AWS.
> I have created a YARN cluster on AWS EC2 machines.
> Trying to submit Flink job to the remote YARN cluster using the Flink
> Client running on my local machine.
>
> The Jobmanager start successfully on the YARN container but the client is
> not able to connect to the Jobmanager.
>
> Flink Client Logs -
>
> 13:57:34,877 INFO  org.apache.flink.yarn.FlinkYarnClient
>       - Deploying cluster, current state ACCEPTED
> 13:57:35,951 INFO  org.apache.flink.yarn.FlinkYarnClient
>       - Deploying cluster, current state ACCEPTED
> 13:57:37,027 INFO  org.apache.flink.yarn.FlinkYarnClient
>       - YARN application has been deployed successfully.
> 13:57:37,100 INFO  org.apache.flink.yarn.FlinkYarnCluster
>        - Start actor system.
> 13:57:37,532 INFO  org.apache.flink.yarn.FlinkYarnCluster
>        - Start application client.
> YARN cluster started
> JobManager web interface address
> http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8088/proxy/application_1456184947990_0003/
> Waiting until all TaskManagers have connected
> 13:57:37,540 INFO  org.apache.flink.yarn.ApplicationClient
>       - Notification about new leader address 
> akka.tcp://flink@54.35.41.12:41292/user/jobmanager
> with session ID null.
> No status updates from the YARN cluster received so far. Waiting ...
> 13:57:37,543 INFO  org.apache.flink.yarn.ApplicationClient
>       - Received address of new leader 
> akka.tcp://flink@54.35.41.12:41292/user/jobmanager
> with session ID null.
> 13:57:37,543 INFO  org.apache.flink.yarn.ApplicationClient
>       - Disconnect from JobManager null.
> 13:57:37,545 INFO  org.apache.flink.yarn.ApplicationClient
>       - Trying to register at JobManager akka.tcp://flink@54.35.41.12
> :41292/user/jobmanager.
> No status updates from the YARN cluster received so far. Waiting ...
>
> The logs of the Jobmanager contains the following -
>
> 21:57:39,142 ERROR akka.remote.EndpointWriter                                 
>    - dropping message [class akka.actor.ActorSelectionMessage] for non-local 
> recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at 
> [akka.tcp://flink@54.35.41.12:41292] inbound addresses are 
> [akka.tcp://flink@172.31.23.18:41292]
> 21:57:40,782 INFO  org.apache.flink.runtime.instance.InstanceManager          
>    - Registered TaskManager at ec2-54-35-41-12 
> (akka.tcp://flink@172.31.23.18:60565/user/taskmanager) as 
> 72101dd2ee94caa7a5ec5a75488359aa. Current number of registered hosts is 1. 
> Current number of alive task slots is 1.
> 21:57:41,162 ERROR akka.remote.EndpointWriter                                 
>    - dropping message [class akka.actor.ActorSelectionMessage] for non-local 
> recipient [Actor[akka.tcp://flink@54.35.41.12:41292/]] arriving at 
> [akka.tcp://flink@54.35.41.12:41292] inbound addresses are 
> [akka.tcp://flink@172.31.23.18:41292]
>
> It seems the problem is in the mismatch of the Jobmanager Akka actors
> system running address and the one user by the Client.
> 172.31.23.18 – is the internal private IP of the EC2 machine where the
> Jobmanager container is running.
> 54.35.41.12 – is the external IP of the EC2 machine, used by Flink client
> to submit the Job.
> Because of this mismatch the messages are ignored by the Akka actor System.
>
> Can someone please help me with this issue.
> I can share the detailed logs, if required.
>
> Thanks,
> Abhi
>
>

Reply via email to