Hi,

Any thoughts on this? Do you need any further details about the setup?

Thanks,

Juan

On Tue, Nov 21, 2017 at 8:59 AM, Juan Rodríguez Hortalá <
[email protected]> wrote:

> Hi,
>
> Anyone might help a newbie ramping up with Ignite on YARN?
>
> Thanks,
>
> Juan
>
>
> On Sun, Nov 19, 2017 at 7:34 PM, Juan Rodríguez Hortalá <
> [email protected]> wrote:
>
>> Hi,
>>
>> I'm trying to run ignite on AWS EMR as a YARN application, using
>> zookeeper for node discovery. I have compiled ignite with
>>
>> ```
>> mvn clean package -DskipTests -Dignite.edition=hadoop
>> -Dhadoop.version=2.7.3
>> ```
>>
>> I'm using ignite_yarn.properties
>>
>> ```
>> # The number of nodes in the cluster.
>> IGNITE_NODE_COUNT=3
>>
>> # The number of CPU Cores for each Apache Ignite node.
>> IGNITE_RUN_CPU_PER_NODE=1
>>
>> # The number of Megabytes of RAM for each Apache Ignite node.
>> IGNITE_MEMORY_PER_NODE=500
>>
>> IGNITE_PATH=hdfs:///user/hadoop/ignite/apache-ignite-2.3.0-
>> hadoop-2.7.3.zip
>>
>> IGNITE_XML_CONFIG=hdfs:///user/hadoop/ignite/ignite_conf.xml
>>
>> # Local path
>> IGNITE_WORK_DIR=/mnt
>>
>> # Local path
>> IGNITE_RELEASES_DIR=/mnt
>>
>> IGNITE_WORKING_DIR=/mnt
>> ````
>>
>> and ignite_conf.xml as
>>
>> ```
>> <?xml version="1.0" encoding="UTF-8"?>
>> <beans xmlns="http://www.springframework.org/schema/beans";
>>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>        xsi:schemaLocation="
>>         http://www.springframework.org/schema/beans
>>         http://www.springframework.org/schema/beans/spring-beans.xsd";>
>>     <bean id="ignite.cfg" class="org.apache.ignite.confi
>> guration.IgniteConfiguration">
>>         <property name="cacheConfiguration">
>>             <list>
>>                 <!-- Partitioned replicated cache configuration (Atomic
>> mode). -->
>>                 <bean class="org.apache.ignite.confi
>> guration.CacheConfiguration">
>>                     <property name="name" value="default"/>
>>                     <property name="atomicityMode" value="ATOMIC"/>
>>                     <property name="backups" value="3"/>
>>                     <property name="cacheMode" value="PARTITIONED"/>
>>                 </bean>
>>             </list>
>>         </property>
>>
>>         <!-- Explicitly configure TCP discovery SPI to provide list of
>> initial nodes. -->
>>         <property name="discoverySpi">
>>             <bean class="org.apache.ignite.spi.d
>> iscovery.tcp.TcpDiscoverySpi">
>>               <property name="ipFinder">
>>               <bean class="org.apache.ignite.spi.d
>> iscovery.tcp.ipfinder.zk.TcpDiscoveryZookeeperIpFinder">
>>                    <!-- FIXME change to master internal API (as used by
>> YARN), e.g. ip-10-0-0-154.ec2.internal:2181 -->
>>                   <property name="zkConnectionString"
>> value="ip-10-0-0-173.ec2.internal:2181"/>
>>               </bean>
>>               </property>
>>             </bean>
>>         </property>
>>         <property name="gridLogger">
>>           <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>             <!-- Default path relative to IGNITE_HOME, assumming
>> IGNITE_HOME is set to the
>>           root of the Ignite installation  -->
>>             <constructor-arg type="java.lang.String"
>> value="config/ignite-log4j2.xml"/>
>>           </bean>
>>         </property>
>>     </bean>
>> </beans>
>> ```
>>
>> Then I launch the yarn job as
>>
>>
>> ```
>> IGNITE_YARN_JAR=/mnt/ignite/apache-ignite-2.3.0-src/modules/
>> yarn/target/ignite-yarn-2.3.0.jar
>>  yarn jar ${IGNITE_YARN_JAR} ${IGNITE_YARN_JAR}
>> /mnt/ignite/ignite_yarn.properties
>> ```
>>
>> The app launches and the application master is outputting logs, but
>> containers only last some seconds running, and the application is
>> constantly asking for more containers. For example, in the application
>> master log
>>
>> ```
>>
>> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster 
>> onContainersAllocated
>> INFO: Launching container: container_1511142795395_0005_01_017079.
>> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy 
>> : ip-10-0-0-230.ec2.internal:8041
>> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster 
>> onContainersAllocated
>> INFO: Launching container: container_1511142795395_0005_01_017080.
>> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy 
>> : ip-10-0-0-78.ec2.internal:8041
>> Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster 
>> onContainersAllocated
>> INFO: Launching container: container_1511142795395_0005_01_017081.
>> 17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening proxy 
>> : ip-10-0-0-193.ec2.internal:8041
>> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
>> onContainersCompleted
>> INFO: Container completed. Container id: 
>> container_1511142795395_0005_01_017080. State: COMPLETE.
>> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
>> onContainersCompleted
>> INFO: Container completed. Container id: 
>> container_1511142795395_0005_01_017081. State: COMPLETE.
>> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
>> onContainersCompleted
>> INFO: Container completed. Container id: 
>> container_1511142795395_0005_01_017079. State: COMPLETE.
>> Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster 
>> onContainersAllocated
>>
>> ```
>>
>> In the logs for a node manager I see containers seem to fail when they
>> are launched, because the corresponding bash command is not well formed
>>
>> ```
>> 2017-11-20 03:08:47,810 INFO org.apache.hadoop.yarn.server.
>> nodemanager.containermanager.container.ContainerImpl (AsyncDispatcher
>> event handler): Container container_1511142795395_0005_01_017281
>> transitioned from LOCALIZED to RUNNING
>> 2017-11-20 03:08:47,811 INFO org.apache.hadoop.yarn.server.
>> nodemanager.DefaultContainerExecutor (ContainersLauncher #4):
>> launchContainer: [bash, /mnt/yarn/usercache/hadoop/app
>> cache/application_1511142795395_0005/container_1511142795395
>> _0005_01_017281/default_container_executor.sh]
>> 2017-11-20 03:08:47,819 WARN org.apache.hadoop.yarn.server.
>> nodemanager.DefaultContainerExecutor (ContainersLauncher #4): Exit code
>> from container container_1511142795395_0005_01_017281 is : 2
>> 2017-11-20 03:08:47,819 WARN org.apache.hadoop.yarn.server.
>> nodemanager.DefaultContainerExecutor (ContainersLauncher #4): Exception
>> from container-launch with container ID: 
>> container_1511142795395_0005_01_017281
>> and exit code: 2
>> ExitCodeException exitCode=2: /mnt/yarn/usercache/hadoop/app
>> cache/application_1511142795395_0005/container_1511142795395
>> _0005_01_017281/launch_container.sh: line 4: syntax error near
>> unexpected token `('
>> /mnt/yarn/usercache/hadoop/appcache/application_151114279539
>> 5_0005/container_1511142795395_0005_01_017281/launch_container.sh: line
>> 4: `export BASH_FUNC_run_prestart()="() {  su -s /bin/bash $SVC_USER -c "cd
>> $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"'
>>
>>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
>>         at org.apache.hadoop.util.Shell.run(Shell.java:479)
>>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Sh
>> ell.java:773)
>>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerEx
>> ecutor.launchContainer(DefaultContainerExecutor.java:212)
>>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.l
>> auncher.ContainerLaunch.call(ContainerLaunch.java:302)
>>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.l
>> auncher.ContainerLaunch.call(ContainerLaunch.java:82)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>> 2017-11-20 03:08:47,819 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
>> (ContainersLauncher #4): Exception from container-launch.
>> 2017-11-20 03:08:47,819 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
>> (ContainersLauncher #4): Container id: container_1511142795395_0005_0
>> 1_017281
>> 2017-11-20 03:08:47,819 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
>> (ContainersLauncher #4): Exit code: 2
>> 2017-11-20 03:08:47,819 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
>> (ContainersLauncher #4): Exception message: /mnt/yarn/usercache/hadoop/app
>> cache/application_1511142795395_0005/container_1511142795395
>> _0005_01_017281/launch_container.sh: line 4: syntax error near
>> unexpected token `('
>> 2017-11-20 03:08:47,819 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
>> (ContainersLauncher #4): /mnt/yarn/usercache/hadoop/app
>> cache/application_1511142795395_0005/container_1511142795395
>> _0005_01_017281/launch_container.sh: line 4: `export
>> BASH_FUNC_run_prestart()="() {  su -s /bin/bash $SVC_USER -c "cd
>> $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"'
>> 2017-11-20 03:08:47,819 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
>> (ContainersLauncher #4):
>> 2017-11-20 03:08:47,819 INFO 
>> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
>> (ContainersLauncher #4): Stack trace: ExitCodeException exitCode=2:
>> /mnt/yarn/usercache/hadoop/appcache/application_151114279539
>> 5_0005/container_1511142795395_0005_01_017281/launch_container.sh: line
>> 4: syntax error near unexpected token `('
>> ```
>>
>> When I launch ignite manually in the master it is able to start fine, and
>> connect to zookeeper, but I see a topology with just 1 node.
>>
>> Any thoughts on what I might be doing wrong here?
>>
>> Thanks in advance.
>>
>> Juan Rodriguez
>>
>>
>

Reply via email to