Greetings to all, I have installed and configured Hadoop2.7.0 on a linux VM. Then I successfully ran the pre-compiled/packaged examples (e.g. PI, WordCount etc..). I also downloaded the Hadoop2.7.0 source code and created an eclipse project. I exported the WordCount jar file and tried to run the example from the command line as follows:
> yarn jar /opt/yarn/my_examples/WordCount.jar /user/yarn/input/wordcount.txt output Q1: When I used the default WordCount implementation (shown in listing1), it failed with a list of exceptions and suggestions to use the Tool and toolRunner interfaces/utils (see errorListing1). I updated the code (see listing2) and included these suggested utils and it ran successfully. Could you please provide an explanation as to why the application failed to run in the first attempt and the necessity to use the Tool/tooRunner utils? Q2: Does this example create a yarn client implicitly and interacts with the Yarn layer? if not, then could you please explain how the application interacted with the hdfs layer given that the Yarn later is in between? Thx and BR Ista
[hdfs@caotclc04881 ~]$ yarn jar /opt/yarn/my_examples/WordCount.jar /user/yarn/input/wordcount.txt output 15/08/05 13:59:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/08/05 13:59:33 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 15/08/05 13:59:33 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 15/08/05 13:59:33 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String). 15/08/05 13:59:33 INFO input.FileInputFormat: Total input paths to process : 1 15/08/05 13:59:33 INFO mapreduce.JobSubmitter: number of splits:1 15/08/05 13:59:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437148602144_0005 15/08/05 13:59:34 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources. 15/08/05 13:59:34 INFO impl.YarnClientImpl: Submitted application application_1437148602144_0005 15/08/05 13:59:34 INFO mapreduce.Job: The url to track the job: http://caotclc04881:8088/proxy/application_1437148602144_0005/ 15/08/05 13:59:34 INFO mapreduce.Job: Running job: job_1437148602144_0005 15/08/05 13:59:40 INFO mapreduce.Job: Job job_1437148602144_0005 running in uber mode : false 15/08/05 13:59:40 INFO mapreduce.Job: map 0% reduce 0% 15/08/05 13:59:43 INFO mapreduce.Job: Task Id : attempt_1437148602144_0005_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 8 more 15/08/05 13:59:47 INFO mapreduce.Job: Task Id : attempt_1437148602144_0005_m_000000_1, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 8 more 15/08/05 13:59:51 INFO mapreduce.Job: Task Id : attempt_1437148602144_0005_m_000000_2, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 8 more 15/08/05 13:59:57 INFO mapreduce.Job: map 100% reduce 100% 15/08/05 13:59:57 INFO mapreduce.Job: Job job_1437148602144_0005 failed with state FAILED due to: Task failed task_1437148602144_0005_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 15/08/05 13:59:57 INFO mapreduce.Job: Counters: 9 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8893 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=8893 Total vcore-seconds taken by all map tasks=8893 Total megabyte-seconds taken by all map tasks=9106432 [hdfs@caotclc04881 ~]$ [hdfs@caotclc04881 ~]$ hadoop fs -ls -R /user 15/08/05 14:00:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable drwxr-xr-x - hdfs supergroup 0 2015-08-05 13:59 /user/hdfs drwxr-xr-x - hdfs supergroup 0 2015-08-05 13:59 /user/hdfs/output drwxr-xr-x - yarn hadoop 0 2015-08-05 13:56 /user/yarn drwxr-xr-x - hdfs hadoop 0 2015-08-05 11:22 /user/yarn/input -rw-r--r-- 1 hdfs hadoop 31 2015-08-05 11:22 /user/yarn/input/wordcount.txt [hdfs@caotclc04881 ~]$ hadoop fs -rmdir /user/hdfs/output 15/08/05 14:00:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hdfs@caotclc04881 ~]$ yarn jar /opt/yarn/my_examples/WordCount.jar /user/yarn/input/wordcount.txt output 15/08/05 14:49:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/08/05 14:49:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 15/08/05 14:49:30 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 15/08/05 14:49:30 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String). 15/08/05 14:49:30 INFO input.FileInputFormat: Total input paths to process : 1 15/08/05 14:49:30 INFO mapreduce.JobSubmitter: number of splits:1 15/08/05 14:49:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437148602144_0007 15/08/05 14:49:31 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources. 15/08/05 14:49:31 INFO impl.YarnClientImpl: Submitted application application_1437148602144_0007 15/08/05 14:49:31 INFO mapreduce.Job: The url to track the job: http://caotclc04881:8088/proxy/application_1437148602144_0007/ 15/08/05 14:49:31 INFO mapreduce.Job: Running job: job_1437148602144_0007 15/08/05 14:49:37 INFO mapreduce.Job: Job job_1437148602144_0007 running in uber mode : false 15/08/05 14:49:37 INFO mapreduce.Job: map 0% reduce 0% 15/08/05 14:49:40 INFO mapreduce.Job: Task Id : attempt_1437148602144_0007_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 8 more 15/08/05 14:49:44 INFO mapreduce.Job: Task Id : attempt_1437148602144_0007_m_000000_1, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 8 more 15/08/05 14:49:48 INFO mapreduce.Job: Task Id : attempt_1437148602144_0007_m_000000_2, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 8 more 15/08/05 14:49:54 INFO mapreduce.Job: map 100% reduce 100% 15/08/05 14:49:54 INFO mapreduce.Job: Job job_1437148602144_0007 failed with state FAILED due to: Task failed task_1437148602144_0007_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 15/08/05 14:49:54 INFO mapreduce.Job: Counters: 9 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=8980 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=8980 Total vcore-seconds taken by all map tasks=8980 Total megabyte-seconds taken by all map tasks=9195520 [hdfs@caotclc04881 ~]$ yarn jar /opt/yarn/my_examples/WordCount.jar /user/yarn/input/wordcount.txt output 15/08/05 15:07:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/08/05 15:07:51 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/hdfs/output already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:269) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) at WordCount.run(WordCount.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at WordCount.main(WordCount.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
//package org.myorg; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WordCount2 { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); //Job job = new Job(conf, "wordcount"); Job job = Job.getInstance(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
//package org.myorg; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WordCount extends Configured implements Tool{ public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res); } @Override public int run(String[] args) throws Exception { // When implementing tool Configuration conf = this.getConf(); // Create job //Job job = new Job(conf, "Tool Job"); Job job = Job.getInstance(conf, "wordcount"); job.setJarByClass(WordCount.class); // Setup MapReduce job // Do not specify the number of Reducer job.setMapperClass(Mapper.class); job.setReducerClass(Reducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1; } //} public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } }