OK that was indeed a classpath issue, which I solved by directly exporting the output of hadoop classpath (ie. the list of neeed jars, see this<http://doc.mapr.com/display/MapR/hadoop+classpath>) into HADOOP_CLASSPATH in hadoop-env.sh and yarn-env.sh
With this fixed, the stuck issue came back so I will study Adam's suggestion On 11 December 2013 10:01, Silvina Caíno Lores <silvi.ca...@gmail.com>wrote: > Actually now it seems to be running (or at least attempting to run) but I > get further errors: > > hadoop jar > ~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar > pi 1 100 > > INFO mapreduce.Job: Job job_1386751964857_0001 failed with state FAILED > due to: Application application_1386751964857_0001 failed 2 times due to AM > Container for appattempt_1386751964857_0001_000002 exited with exitCode: 1 > due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:504) > at org.apache.hadoop.util.Shell.run(Shell.java:417) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:636) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > > I guess it seems some sort of classpath issue because of this log: > > /scratch/HDFS-scaino-2/logs/application_1386751964857_0001/container_1386751964857_0001_01_000001$ > cat stderr > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/service/CompositeService > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:792) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.service.CompositeService > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 13 more > > > I haven't found a solution yet despite the classpath looks nice: > > hadoop classpath > > > /home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/lib/*:/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar > > > Could that be related to the previous launch errors?? > > Thanks in advance :) > > > > > On 11 December 2013 00:29, Adam Kawa <kawa.a...@gmail.com> wrote: > >> It sounds like the job was successfully submitted to the cluster, but >> there as some problem when starting/running AM, so that no progress is >> made. It happened to me once, when I was playing with YARN on a cluster >> consisting of very small machines, and I mis-configured YARN to allocated >> to AM more memory than the actual memory available on any machine on my >> cluster. So that RM was not able to start AM anywhere due to inability to >> find big enough container. >> >> Could you show the logs from the job? The link should be available on >> your console after you submit a job e.g. >> 13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job: >> http://compute-7-2:8088/proxy/application_1386668372725_0001/ >> >> >> 2013/12/10 Silvina Caíno Lores <silvi.ca...@gmail.com> >> >>> Thank you! I realized that, despite I exported the variables in the >>> scripts, there were a few errors and my desired configuration wasn't being >>> used (which explained other strange behavior). >>> >>> However, I'm still getting the same issue with the examples, for >>> instance: >>> >>> hadoop jar >>> ~/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar >>> pi 1 100 >>> Number of Maps = 1 >>> Samples per Map = 100 >>> 13/12/10 10:41:18 WARN util.NativeCodeLoader: Unable to load >>> native-hadoop library for your platform... using builtin-java classes where >>> applicable >>> Wrote input for Map #0 >>> Starting Job >>> 13/12/10 10:41:19 INFO client.RMProxy: Connecting to ResourceManager at / >>> 0.0.0.0:8032 >>> 13/12/10 10:41:20 INFO input.FileInputFormat: Total input paths to >>> process : 1 >>> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: number of splits:1 >>> 13/12/10 10:41:20 INFO Configuration.deprecation: user.name is >>> deprecated. Instead, use mapreduce.job.user.name >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.jar is >>> deprecated. Instead, use mapreduce.job.jar >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.map.tasks.speculative.execution is deprecated. Instead, use >>> mapreduce.map.speculative >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.reduce.tasks is >>> deprecated. Instead, use mapreduce.job.reduces >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.output.value.class is deprecated. Instead, use >>> mapreduce.job.output.value.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use >>> mapreduce.reduce.speculative >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.map.class is >>> deprecated. Instead, use mapreduce.job.map.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.job.name is >>> deprecated. Instead, use mapreduce.job.name >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapreduce.reduce.class >>> is deprecated. Instead, use mapreduce.job.reduce.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapreduce.inputformat.class is deprecated. Instead, use >>> mapreduce.job.inputformat.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.input.dir is >>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.output.dir is >>> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapreduce.outputformat.class is deprecated. Instead, use >>> mapreduce.job.outputformat.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.map.tasks is >>> deprecated. Instead, use mapreduce.job.maps >>> 13/12/10 10:41:20 INFO Configuration.deprecation: >>> mapred.output.key.class is deprecated. Instead, use >>> mapreduce.job.output.key.class >>> 13/12/10 10:41:20 INFO Configuration.deprecation: mapred.working.dir is >>> deprecated. Instead, use mapreduce.job.working.dir >>> 13/12/10 10:41:20 INFO mapreduce.JobSubmitter: Submitting tokens for >>> job: job_1386668372725_0001 >>> 13/12/10 10:41:20 INFO impl.YarnClientImpl: Submitted application >>> application_1386668372725_0001 to ResourceManager at /0.0.0.0:8032 >>> 13/12/10 10:41:21 INFO mapreduce.Job: The url to track the job: >>> http://compute-7-2:8088/proxy/application_1386668372725_0001/ >>> 13/12/10 10:41:21 INFO mapreduce.Job: Running job: job_1386668372725_0001 >>> 13/12/10 10:41:31 INFO mapreduce.Job: Job job_1386668372725_0001 running >>> in uber mode : false >>> 13/12/10 10:41:31 INFO mapreduce.Job: map 0% reduce 0% >>> ---- stuck here ---- >>> >>> >>> I hope the problem is not in the environment files. I have the following >>> at the beginning of hadoop-env.sh: >>> >>> # The java implementation to use. >>> export JAVA_HOME=/home/software/jdk1.7.0_25/ >>> >>> # The jsvc implementation to use. Jsvc is required to run secure >>> datanodes. >>> #export JSVC_HOME=${JSVC_HOME} >>> >>> export >>> HADOOP_INSTALL=/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT >>> >>> export HADOOP_HDFS_HOME=$HADOOP_INSTALL >>> export HADOOP_COMMON_HOME=$HADOOP_INSTALL >>> export HADOOP_CONF_DIR=$HADOOP_INSTALL"/etc/hadoop" >>> >>> >>> and this in yarn-env.sh: >>> >>> export JAVA_HOME=/home/software/jdk1.7.0_25/ >>> >>> export >>> HADOOP_INSTALL=/home/scaino/hadoop-2.2.0-maven/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT >>> >>> export HADOOP_HDFS_HOME=$HADOOP_INSTALL >>> export HADOOP_COMMON_HOME=$HADOOP_INSTALL >>> export HADOOP_CONF_DIR=$HADOOP_INSTALL"/etc/hadoop" >>> >>> >>> Not sure what to do about HADOOP_YARN_USER though, since I don't have a >>> dedicated user to run the demons. >>> >>> Thanks! >>> >>> >>> On 10 December 2013 10:10, Taka Shinagawa <taka.epsi...@gmail.com>wrote: >>> >>>> I had a similar problem after setting up Hadoop 2.2.0 based on the >>>> instructions at >>>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html >>>> >>>> Although it's not documented on the page, I needed to >>>> edit hadoop-env.sh and yarn-env.sh as well to update >>>> JAVA_HOME, HADOOP_CONF_DIR, HADOOP_YARN_USER and YARN_CONF_DIR. >>>> >>>> Once these variables are set, I was able to run the example >>>> successfully. >>>> >>>> >>>> >>>> On Mon, Dec 9, 2013 at 11:37 PM, Silvina Caíno Lores < >>>> silvi.ca...@gmail.com> wrote: >>>> >>>>> >>>>> Hi everyone, >>>>> >>>>> I'm having trouble running the Hadoop examples in a single node. All >>>>> the executions get stuck at the running state at 0% map and reduce and the >>>>> logs don't seem to indicate any issue, besides the need to kill the node >>>>> manager: >>>>> >>>>> compute-0-7-3: nodemanager did not stop gracefully after 5 seconds: >>>>> killing with kill -9 >>>>> >>>>> RM >>>>> >>>>> 2013-12-09 11:52:22,466 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: >>>>> Command to launch container container_1386585879247_0001_01_000001 : >>>>> $JAVA_HOME/bin/java -Dlog4j.configuration=container-log4j.properties >>>>> -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 >>>>> -Dhadoop.root.logger=INFO,CLA -Xmx1024m >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout >>>>> 2><LOG_DIR>/stderr >>>>> 2013-12-09 11:52:22,882 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done >>>>> launching container Container: [ContainerId: >>>>> container_1386585879247_0001_01_000001, NodeId: compute-0-7-3:8010, >>>>> NodeHttpAddress: compute-0-7-3:8042, Resource: <memory:2000, vCores:1>, >>>>> Priority: 0, Token: Token { kind: ContainerToken, service: >>>>> 10.0.7.3:8010 }, ] for AM appattempt_1386585879247_0001_000001 >>>>> 2013-12-09 11:52:22,883 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: >>>>> appattempt_1386585879247_0001_000001 State change from ALLOCATED to >>>>> LAUNCHED >>>>> 2013-12-09 11:52:23,371 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: >>>>> container_1386585879247_0001_01_000001 Container Transitioned from >>>>> ACQUIRED >>>>> to RUNNING >>>>> 2013-12-09 11:52:30,922 INFO >>>>> SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for >>>>> appattempt_1386585879247_0001_000001 (auth:SIMPLE) >>>>> 2013-12-09 11:52:30,938 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM >>>>> registration appattempt_1386585879247_0001_000001 >>>>> 2013-12-09 11:52:30,939 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=scaino >>>>> IP=10.0.7.3 OPERATION=Register App Master TARGET=ApplicationMasterService >>>>> RESULT=SUCCESS APPID=application_1386585879247_0001 >>>>> APPATTEMPTID=appattempt_1386585879247_0001_000001 >>>>> 2013-12-09 11:52:30,941 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: >>>>> appattempt_1386585879247_0001_000001 State change from LAUNCHED to RUNNING >>>>> 2013-12-09 11:52:30,941 INFO >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: >>>>> application_1386585879247_0001 State change from ACCEPTED to RUNNING >>>>> >>>>> >>>>> NM >>>>> >>>>> 2013-12-10 08:26:02,100 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: >>>>> Got >>>>> event CONTAINER_STOP for appId application_1386585879247_0001 >>>>> 2013-12-10 08:26:02,102 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: >>>>> Deleting absolute path : >>>>> /scratch/HDFS-scaino-2/tmp/nm-local-dir/usercache/scaino/appcache/application_1386585879247_0001 >>>>> 2013-12-10 08:26:02,103 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: >>>>> Got >>>>> event APPLICATION_STOP for appId application_1386585879247_0001 >>>>> 2013-12-10 08:26:02,110 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: >>>>> Application application_1386585879247_0001 transitioned from >>>>> APPLICATION_RESOURCES_CLEANINGUP to FINISHED >>>>> 2013-12-10 08:26:02,157 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: >>>>> Scheduling Log Deletion for application: application_1386585879247_0001, >>>>> with delay of 10800 seconds >>>>> 2013-12-10 08:26:04,688 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: >>>>> Stopping resource-monitoring for container_1386585879247_0001_01_000001 >>>>> 2013-12-10 08:26:05,838 INFO >>>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: >>>>> Done waiting for Applications to be Finished. Still alive: >>>>> [application_1386585879247_0001] >>>>> 2013-12-10 08:26:05,839 INFO org.apache.hadoop.ipc.Server: Stopping >>>>> server on 8010 >>>>> 2013-12-10 08:26:05,846 INFO org.apache.hadoop.ipc.Server: Stopping >>>>> IPC Server listener on 8010 >>>>> 2013-12-10 08:26:05,847 INFO org.apache.hadoop.ipc.Server: Stopping >>>>> IPC Server Responder >>>>> >>>>> I tried the pi and wordcount examples with same results, any ideas on >>>>> how to debug this? >>>>> >>>>> Thanks in advance. >>>>> >>>>> Regards, >>>>> Silvina Caíno >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >> >