Hi, Any thoughts on this? Do you need any further details about the setup?
Thanks, Juan On Tue, Nov 21, 2017 at 8:59 AM, Juan Rodríguez Hortalá < [email protected]> wrote: > Hi, > > Anyone might help a newbie ramping up with Ignite on YARN? > > Thanks, > > Juan > > > On Sun, Nov 19, 2017 at 7:34 PM, Juan Rodríguez Hortalá < > [email protected]> wrote: > >> Hi, >> >> I'm trying to run ignite on AWS EMR as a YARN application, using >> zookeeper for node discovery. I have compiled ignite with >> >> ``` >> mvn clean package -DskipTests -Dignite.edition=hadoop >> -Dhadoop.version=2.7.3 >> ``` >> >> I'm using ignite_yarn.properties >> >> ``` >> # The number of nodes in the cluster. >> IGNITE_NODE_COUNT=3 >> >> # The number of CPU Cores for each Apache Ignite node. >> IGNITE_RUN_CPU_PER_NODE=1 >> >> # The number of Megabytes of RAM for each Apache Ignite node. >> IGNITE_MEMORY_PER_NODE=500 >> >> IGNITE_PATH=hdfs:///user/hadoop/ignite/apache-ignite-2.3.0- >> hadoop-2.7.3.zip >> >> IGNITE_XML_CONFIG=hdfs:///user/hadoop/ignite/ignite_conf.xml >> >> # Local path >> IGNITE_WORK_DIR=/mnt >> >> # Local path >> IGNITE_RELEASES_DIR=/mnt >> >> IGNITE_WORKING_DIR=/mnt >> ```` >> >> and ignite_conf.xml as >> >> ``` >> <?xml version="1.0" encoding="UTF-8"?> >> <beans xmlns="http://www.springframework.org/schema/beans" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xsi:schemaLocation=" >> http://www.springframework.org/schema/beans >> http://www.springframework.org/schema/beans/spring-beans.xsd"> >> <bean id="ignite.cfg" class="org.apache.ignite.confi >> guration.IgniteConfiguration"> >> <property name="cacheConfiguration"> >> <list> >> <!-- Partitioned replicated cache configuration (Atomic >> mode). --> >> <bean class="org.apache.ignite.confi >> guration.CacheConfiguration"> >> <property name="name" value="default"/> >> <property name="atomicityMode" value="ATOMIC"/> >> <property name="backups" value="3"/> >> <property name="cacheMode" value="PARTITIONED"/> >> </bean> >> </list> >> </property> >> >> <!-- Explicitly configure TCP discovery SPI to provide list of >> initial nodes. --> >> <property name="discoverySpi"> >> <bean class="org.apache.ignite.spi.d >> iscovery.tcp.TcpDiscoverySpi"> >> <property name="ipFinder"> >> <bean class="org.apache.ignite.spi.d >> iscovery.tcp.ipfinder.zk.TcpDiscoveryZookeeperIpFinder"> >> <!-- FIXME change to master internal API (as used by >> YARN), e.g. ip-10-0-0-154.ec2.internal:2181 --> >> <property name="zkConnectionString" >> value="ip-10-0-0-173.ec2.internal:2181"/> >> </bean> >> </property> >> </bean> >> </property> >> <property name="gridLogger"> >> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> >> <!-- Default path relative to IGNITE_HOME, assumming >> IGNITE_HOME is set to the >> root of the Ignite installation --> >> <constructor-arg type="java.lang.String" >> value="config/ignite-log4j2.xml"/> >> </bean> >> </property> >> </bean> >> </beans> >> ``` >> >> Then I launch the yarn job as >> >> >> ``` >> IGNITE_YARN_JAR=/mnt/ignite/apache-ignite-2.3.0-src/modules/ >> yarn/target/ignite-yarn-2.3.0.jar >> yarn jar ${IGNITE_YARN_JAR} ${IGNITE_YARN_JAR} >> /mnt/ignite/ignite_yarn.properties >> ``` >> >> The app launches and the application master is outputting logs, but >> containers only last some seconds running, and the application is >> constantly asking for more containers. For example, in the application >> master log >> >> ``` >> >> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster >> onContainersAllocated >> INFO: Launching container: container_1511142795395_0005_01_017079. >> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy >> : ip-10-0-0-230.ec2.internal:8041 >> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster >> onContainersAllocated >> INFO: Launching container: container_1511142795395_0005_01_017080. >> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy >> : ip-10-0-0-78.ec2.internal:8041 >> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster >> onContainersAllocated >> INFO: Launching container: container_1511142795395_0005_01_017081. >> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy >> : ip-10-0-0-193.ec2.internal:8041 >> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster >> onContainersCompleted >> INFO: Container completed. Container id: >> container_1511142795395_0005_01_017080. State: COMPLETE. >> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster >> onContainersCompleted >> INFO: Container completed. Container id: >> container_1511142795395_0005_01_017081. State: COMPLETE. >> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster >> onContainersCompleted >> INFO: Container completed. Container id: >> container_1511142795395_0005_01_017079. State: COMPLETE. >> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster >> onContainersAllocated >> >> ``` >> >> In the logs for a node manager I see containers seem to fail when they >> are launched, because the corresponding bash command is not well formed >> >> ``` >> 2017-11-20 03:08:47,810 INFO org.apache.hadoop.yarn.server. >> nodemanager.containermanager.container.ContainerImpl (AsyncDispatcher >> event handler): Container container_1511142795395_0005_01_017281 >> transitioned from LOCALIZED to RUNNING >> 2017-11-20 03:08:47,811 INFO org.apache.hadoop.yarn.server. >> nodemanager.DefaultContainerExecutor (ContainersLauncher #4): >> launchContainer: [bash, /mnt/yarn/usercache/hadoop/app >> cache/application_1511142795395_0005/container_1511142795395 >> _0005_01_017281/default_container_executor.sh] >> 2017-11-20 03:08:47,819 WARN org.apache.hadoop.yarn.server. >> nodemanager.DefaultContainerExecutor (ContainersLauncher #4): Exit code >> from container container_1511142795395_0005_01_017281 is : 2 >> 2017-11-20 03:08:47,819 WARN org.apache.hadoop.yarn.server. >> nodemanager.DefaultContainerExecutor (ContainersLauncher #4): Exception >> from container-launch with container ID: >> container_1511142795395_0005_01_017281 >> and exit code: 2 >> ExitCodeException exitCode=2: /mnt/yarn/usercache/hadoop/app >> cache/application_1511142795395_0005/container_1511142795395 >> _0005_01_017281/launch_container.sh: line 4: syntax error near >> unexpected token `(' >> /mnt/yarn/usercache/hadoop/appcache/application_151114279539 >> 5_0005/container_1511142795395_0005_01_017281/launch_container.sh: line >> 4: `export BASH_FUNC_run_prestart()="() { su -s /bin/bash $SVC_USER -c "cd >> $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"' >> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) >> at org.apache.hadoop.util.Shell.run(Shell.java:479) >> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Sh >> ell.java:773) >> at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerEx >> ecutor.launchContainer(DefaultContainerExecutor.java:212) >> at org.apache.hadoop.yarn.server.nodemanager.containermanager.l >> auncher.ContainerLaunch.call(ContainerLaunch.java:302) >> at org.apache.hadoop.yarn.server.nodemanager.containermanager.l >> auncher.ContainerLaunch.call(ContainerLaunch.java:82) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1149) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> 2017-11-20 03:08:47,819 INFO >> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor >> (ContainersLauncher #4): Exception from container-launch. >> 2017-11-20 03:08:47,819 INFO >> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor >> (ContainersLauncher #4): Container id: container_1511142795395_0005_0 >> 1_017281 >> 2017-11-20 03:08:47,819 INFO >> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor >> (ContainersLauncher #4): Exit code: 2 >> 2017-11-20 03:08:47,819 INFO >> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor >> (ContainersLauncher #4): Exception message: /mnt/yarn/usercache/hadoop/app >> cache/application_1511142795395_0005/container_1511142795395 >> _0005_01_017281/launch_container.sh: line 4: syntax error near >> unexpected token `(' >> 2017-11-20 03:08:47,819 INFO >> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor >> (ContainersLauncher #4): /mnt/yarn/usercache/hadoop/app >> cache/application_1511142795395_0005/container_1511142795395 >> _0005_01_017281/launch_container.sh: line 4: `export >> BASH_FUNC_run_prestart()="() { su -s /bin/bash $SVC_USER -c "cd >> $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"' >> 2017-11-20 03:08:47,819 INFO >> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor >> (ContainersLauncher #4): >> 2017-11-20 03:08:47,819 INFO >> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor >> (ContainersLauncher #4): Stack trace: ExitCodeException exitCode=2: >> /mnt/yarn/usercache/hadoop/appcache/application_151114279539 >> 5_0005/container_1511142795395_0005_01_017281/launch_container.sh: line >> 4: syntax error near unexpected token `(' >> ``` >> >> When I launch ignite manually in the master it is able to start fine, and >> connect to zookeeper, but I see a topology with just 1 node. >> >> Any thoughts on what I might be doing wrong here? >> >> Thanks in advance. >> >> Juan Rodriguez >> >> >
