Hi,

Anyone might help a newbie ramping up with Ignite on YARN?

Thanks,

Juan


On Sun, Nov 19, 2017 at 7:34 PM, Juan Rodríguez Hortalá <
[email protected]> wrote:

> Hi,
>
> I'm trying to run ignite on AWS EMR as a YARN application, using zookeeper
> for node discovery. I have compiled ignite with
>
> ```
> mvn clean package -DskipTests -Dignite.edition=hadoop
> -Dhadoop.version=2.7.3
> ```
>
> I'm using ignite_yarn.properties
>
> ```
> # The number of nodes in the cluster.
> IGNITE_NODE_COUNT=3
>
> # The number of CPU Cores for each Apache Ignite node.
> IGNITE_RUN_CPU_PER_NODE=1
>
> # The number of Megabytes of RAM for each Apache Ignite node.
> IGNITE_MEMORY_PER_NODE=500
>
> IGNITE_PATH=hdfs:///user/hadoop/ignite/apache-ignite-2.
> 3.0-hadoop-2.7.3.zip
>
> IGNITE_XML_CONFIG=hdfs:///user/hadoop/ignite/ignite_conf.xml
>
> # Local path
> IGNITE_WORK_DIR=/mnt
>
> # Local path
> IGNITE_RELEASES_DIR=/mnt
>
> IGNITE_WORKING_DIR=/mnt
> ````
>
> and ignite_conf.xml as
>
> ```
> <?xml version="1.0" encoding="UTF-8"?>
> <beans xmlns="http://www.springframework.org/schema/beans";
>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>        xsi:schemaLocation="
>         http://www.springframework.org/schema/beans
>         http://www.springframework.org/schema/beans/spring-beans.xsd";>
>     <bean id="ignite.cfg" class="org.apache.ignite.configuration.
> IgniteConfiguration">
>         <property name="cacheConfiguration">
>             <list>
>                 <!-- Partitioned replicated cache configuration (Atomic
> mode). -->
>                 <bean class="org.apache.ignite.configuration.
> CacheConfiguration">
>                     <property name="name" value="default"/>
>                     <property name="atomicityMode" value="ATOMIC"/>
>                     <property name="backups" value="3"/>
>                     <property name="cacheMode" value="PARTITIONED"/>
>                 </bean>
>             </list>
>         </property>
>
>         <!-- Explicitly configure TCP discovery SPI to provide list of
> initial nodes. -->
>         <property name="discoverySpi">
>             <bean class="org.apache.ignite.spi.
> discovery.tcp.TcpDiscoverySpi">
>               <property name="ipFinder">
>               <bean class="org.apache.ignite.spi.
> discovery.tcp.ipfinder.zk.TcpDiscoveryZookeeperIpFinder">
>                    <!-- FIXME change to master internal API (as used by
> YARN), e.g. ip-10-0-0-154.ec2.internal:2181 -->
>                   <property name="zkConnectionString"
> value="ip-10-0-0-173.ec2.internal:2181"/>
>               </bean>
>               </property>
>             </bean>
>         </property>
>         <property name="gridLogger">
>           <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>             <!-- Default path relative to IGNITE_HOME, assumming
> IGNITE_HOME is set to the
>           root of the Ignite installation  -->
>             <constructor-arg type="java.lang.String"
> value="config/ignite-log4j2.xml"/>
>           </bean>
>         </property>
>     </bean>
> </beans>
> ```
>
> Then I launch the yarn job as
>
>
> ```
> IGNITE_YARN_JAR=/mnt/ignite/apache-ignite-2.3.0-src/
> modules/yarn/target/ignite-yarn-2.3.0.jar
>  yarn jar ${IGNITE_YARN_JAR} ${IGNITE_YARN_JAR} /mnt/ignite/ignite_yarn.
> properties
> ```
>
> The app launches and the application master is outputting logs, but
> containers only last some seconds running, and the application is
> constantly asking for more containers. For example, in the application
> master log
>
> ```
>
> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster 
> onContainersAllocated
> INFO: Launching container: container_1511142795395_0005_01_017079.
> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> ip-10-0-0-230.ec2.internal:8041
> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster 
> onContainersAllocated
> INFO: Launching container: container_1511142795395_0005_01_017080.
> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> ip-10-0-0-78.ec2.internal:8041
> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster 
> onContainersAllocated
> INFO: Launching container: container_1511142795395_0005_01_017081.
> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> ip-10-0-0-193.ec2.internal:8041
> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
> onContainersCompleted
> INFO: Container completed. Container id: 
> container_1511142795395_0005_01_017080. State: COMPLETE.
> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
> onContainersCompleted
> INFO: Container completed. Container id: 
> container_1511142795395_0005_01_017081. State: COMPLETE.
> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
> onContainersCompleted
> INFO: Container completed. Container id: 
> container_1511142795395_0005_01_017079. State: COMPLETE.
> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
> onContainersAllocated
>
> ```
>
> In the logs for a node manager I see containers seem to fail when they are
> launched, because the corresponding bash command is not well formed
>
> ```
> 2017-11-20 03:08:47,810 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.container.ContainerImpl (AsyncDispatcher
> event handler): Container container_1511142795395_0005_01_017281
> transitioned from LOCALIZED to RUNNING
> 2017-11-20 03:08:47,811 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
> (ContainersLauncher #4): launchContainer: [bash, /mnt/yarn/usercache/hadoop/
> appcache/application_1511142795395_0005/container_
> 1511142795395_0005_01_017281/default_container_executor.sh]
> 2017-11-20 03:08:47,819 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
> (ContainersLauncher #4): Exit code from container
> container_1511142795395_0005_01_017281 is : 2
> 2017-11-20 03:08:47,819 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
> (ContainersLauncher #4): Exception from container-launch with container ID:
> container_1511142795395_0005_01_017281 and exit code: 2
> ExitCodeException exitCode=2: /mnt/yarn/usercache/hadoop/
> appcache/application_1511142795395_0005/container_
> 1511142795395_0005_01_017281/launch_container.sh: line 4: syntax error
> near unexpected token `('
> /mnt/yarn/usercache/hadoop/appcache/application_
> 1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
> line 4: `export BASH_FUNC_run_prestart()="() {  su -s /bin/bash $SVC_USER
> -c "cd $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"'
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
>         at org.apache.hadoop.util.Shell.run(Shell.java:479)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:773)
>         at org.apache.hadoop.yarn.server.nodemanager.
> DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:
> 212)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.
> launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.
> launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2017-11-20 03:08:47,819 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
> (ContainersLauncher #4): Exception from container-launch.
> 2017-11-20 03:08:47,819 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
> (ContainersLauncher #4): Container id: container_1511142795395_0005_
> 01_017281
> 2017-11-20 03:08:47,819 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
> (ContainersLauncher #4): Exit code: 2
> 2017-11-20 03:08:47,819 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
> (ContainersLauncher #4): Exception message: /mnt/yarn/usercache/hadoop/
> appcache/application_1511142795395_0005/container_
> 1511142795395_0005_01_017281/launch_container.sh: line 4: syntax error
> near unexpected token `('
> 2017-11-20 03:08:47,819 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
> (ContainersLauncher #4): /mnt/yarn/usercache/hadoop/appcache/application_
> 1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
> line 4: `export BASH_FUNC_run_prestart()="() {  su -s /bin/bash $SVC_USER
> -c "cd $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"'
> 2017-11-20 03:08:47,819 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
> (ContainersLauncher #4):
> 2017-11-20 03:08:47,819 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
> (ContainersLauncher #4): Stack trace: ExitCodeException exitCode=2:
> /mnt/yarn/usercache/hadoop/appcache/application_
> 1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
> line 4: syntax error near unexpected token `('
> ```
>
> When I launch ignite manually in the master it is able to start fine, and
> connect to zookeeper, but I see a topology with just 1 node.
>
> Any thoughts on what I might be doing wrong here?
>
> Thanks in advance.
>
> Juan Rodriguez
>
>

Reply via email to