I am trying to run Flink using Yarn on MapR. My previous issue <http://http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-with-Yarn-on-MapR-tt15448.html> got resolved and I have updated the original post accordingly so.
Accordingly, I modified pom.xml to change the zookeeper version to *mapr zookeeper jar* version which in my case was: *3.4.5-mapr-1604* I then built flink (*flink-1.3-SNAPSHOT*) as follows: *mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.7.0-mapr-1607* The build is successfull. Then I try to run *./bin/yarn-session.sh -n 3* and get the following error: /2017-02-02 16:11:10,717 INFO org.apache.flink.yarn.YarnClusterDescriptor - Using values: 2017-02-02 16:11:10,718 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager count = 3 2017-02-02 16:11:10,718 INFO org.apache.flink.yarn.YarnClusterDescriptor - JobManager memory = 1024 2017-02-02 16:11:10,718 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager memory = 1024 2017-02-02 16:11:10,928 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Process path: null. Event state: SyncConnected. Event type: None 2017-02-02 16:11:10,928 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Connected to ZK: ip-10-101-2-111.ec2.internal:5181,ip-10-101-2-112.ec2.internal:5181,ip-10-101-2-113.ec2.internal:5181 2017-02-02 16:11:10,929 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Getting serviceData for master node of resourcemanager 2017-02-02 16:11:10,935 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Process path: null. Event state: SaslAuthenticated. Event type: None 2017-02-02 16:11:10,948 INFO org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider - Updated RM address to ip-10-101-2-111.ec2.internal/10.101.2.111:8032 2017-02-02 16:11:11,216 WARN org.apache.flink.yarn.YarnClusterDescriptor - The configuration directory ('/home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2017-02-02 16:11:11,225 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/log4j.properties to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/log4j.properties 2017-02-02 16:11:11,249 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/lib 2017-02-02 16:11:11,680 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/logback.xml to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/logback.xml 2017-02-02 16:11:11,685 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib/flink-dist_2.10-1.3-SNAPSHOT.jar to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/flink-dist_2.10-1.3-SNAPSHOT.jar 2017-02-02 16:11:12,932 INFO org.apache.flink.yarn.Utils - Copying from /home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/flink-conf.yaml to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/flink-conf.yaml 2017-02-02 16:11:12,949 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1485984594262_0007 2017-02-02 16:11:12,977 INFO org.apache.hadoop.yarn.security.ExternalTokenManagerFactory - Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager 2017-02-02 16:11:13,195 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1485984594262_0007 2017-02-02 16:11:13,195 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated 2017-02-02 16:11:13,196 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:425) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1485984594262_0007 failed 1 times due to AM Container for appattempt_1485984594262_0007_000001 exited with exitCode: 255 For more detailed output, check application tracking page:http://ip-10-101-2-111.ec2.internal:8088/cluster/app/application_1485984594262_0007Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e05_1485984594262_0007_01_000001 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : user is ubuntu main : requested yarn user is ubuntu Container exited with a non-zero exit code 255 Failing this attempt. Failing the application. If log aggregation is enabled on your cluster, use this command to further investigate the issue: yarn logs -applicationId application_1485984594262_0007 at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:888) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:557) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:423) ... 9 more 2017-02-02 16:11:18,231 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook 2017-02-02 16:11:18,231 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application 2017-02-02 16:11:18,235 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1485984594262_0007 2017-02-02 16:11:18,336 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in maprfs:/user/ubuntu/.flink/application_1485984594262_0007/ So i went ahead and checked the yarn container logs and they have the following error: /Uncaught error from thread [flink-akka.remote.default-remote-dispatcher-5] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[flink] java.lang.VerifyError: (class: org/jboss/netty/channel/socket/nio/NioWorkerPool, method: createWorker signature: (Ljava/util/concurrent/Executor;)Lorg/jboss/netty/channel/socket/nio/AbstractNioWorker;) Wrong return type in function at akka.remote.transport.netty.NettyTransport.<init>(NettyTransport.scala:295) at akka.remote.transport.netty.NettyTransport.<init>(NettyTransport.scala:251) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78) at scala.util.Try$.apply(Try.scala:161) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84) at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84) at scala.util.Success.flatMap(Try.scala:200) at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84) at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:749) at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:741) at scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721) at akka.remote.EndpointManager.akka$remote$EndpointManager$$listens(Remoting.scala:741) at akka.remote.EndpointManager$$anonfun$receive$2.applyOrElse(Remoting.scala:500) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at akka.remote.EndpointManager.aroundReceive(Remoting.scala:404) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)/ So, I figured this might be because of clashing netty versions between flink and MapR’s zookeeper jar. And indeed, MapR’s zookeeper jar has following version of netty /<groupId>org.jboss.netty</groupId> <artifactId>netty</artifactId> <version>3.2.2.Final</version>/ And Flink has following: /<groupId>io.netty</groupId> <artifactId>netty-all</artifactId> <version>4.0.27.Final</version>/ So, I changed Flink’s pom.xml to exclude netty from zookeeper dependency. /<exclusion> <groupId>org.jboss.netty</groupId> <artifactId>netty</artifactId> </exclusion>/ Then I again ran ./bin/yarn-session.sh -n 3 and got the following error: /2017-02-02 15:44:03,540 INFO org.apache.flink.yarn.YarnClusterDescriptor - Using values: 2017-02-02 15:44:03,541 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager count = 3 2017-02-02 15:44:03,541 INFO org.apache.flink.yarn.YarnClusterDescriptor - JobManager memory = 1024 2017-02-02 15:44:03,541 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager memory = 1024 2017-02-02 15:44:03,728 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Process path: null. Event state: SyncConnected. Event type: None 2017-02-02 15:44:03,728 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Connected to ZK: ip-10-101-2-111.ec2.internal:5181,ip-10-101-2-112.ec2.internal:5181,ip-10-101-2-113.ec2.internal:5181 2017-02-02 15:44:03,729 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Getting serviceData for master node of resourcemanager 2017-02-02 15:44:03,733 INFO com.mapr.util.zookeeper.ZKDataRetrieval - Process path: null. Event state: SaslAuthenticated. Event type: None 2017-02-02 15:44:03,745 INFO org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider - Updated RM address to ip-10-101-2-111.ec2.internal/10.101.2.111:8032 2017-02-02 15:44:04,016 WARN org.apache.flink.yarn.YarnClusterDescriptor - The configuration directory ('/home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2017-02-02 15:44:04,025 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/lib 2017-02-02 15:44:04,446 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/logback.xml to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/logback.xml 2017-02-02 15:44:04,452 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/log4j.properties to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/log4j.properties 2017-02-02 15:44:04,457 INFO org.apache.flink.yarn.Utils - Copying from file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib/flink-dist_2.10-1.3-SNAPSHOT.jar to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/flink-dist_2.10-1.3-SNAPSHOT.jar 2017-02-02 15:44:05,826 INFO org.apache.flink.yarn.Utils - Copying from /home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/flink-conf.yaml to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/flink-conf.yaml 2017-02-02 15:44:05,842 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1485984594262_0006 2017-02-02 15:44:05,870 INFO org.apache.hadoop.yarn.security.ExternalTokenManagerFactory - Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager 2017-02-02 15:44:06,088 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1485984594262_0006 2017-02-02 15:44:06,089 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated 2017-02-02 15:44:06,090 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:428) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1485984594262_0006 failed 1 times due to AM Container for appattempt_1485984594262_0006_000001 exited with exitCode: 31 For more detailed output, check application tracking page:http://ip-10-101-2-111.ec2.internal:8088/cluster/app/application_1485984594262_0006Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e05_1485984594262_0006_01_000001 Exit code: 31 Stack trace: ExitCodeException exitCode=31: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : user is ubuntu main : requested yarn user is ubuntu Container exited with a non-zero exit code 31 Failing this attempt. Failing the application. If log aggregation is enabled on your cluster, use this command to further investigate the issue: yarn logs -applicationId application_1485984594262_0006 at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:891) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:560) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:426) ... 9 more 2017-02-02 15:44:12,635 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook 2017-02-02 15:44:12,635 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application 2017-02-02 15:44:12,641 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1485984594262_0006 2017-02-02 15:44:12,742 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in maprfs:/user/ubuntu/.flink/application_1485984594262_0006/ So i went ahead and checked the yarn container logs again and they have the following error: /2017-02-02 15:44:11,3521 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:580 Thread: 19306 Client initialization failed due to mismatch of libraries. Please make sure that the java library version matches the native build version 5.2.0.39122.GA and native patch version $Id: mapr-version: 5.2.0.39122.GA 40967:64c8e3c8ee67 $/ So, is there any way I can resolve this netty conflict to make Flink work on Yarn with MapR? Meanwhile, I am also raising this issue with MapR community. Thanks in advance -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Netty-issues-while-deploying-Flink-with-Yarn-on-MapR-tp11411.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.