Issues and Questions on running Hadoop 2.7 Yarn MapReduce examples

Istabrak Abdul-Fatah Tue, 11 Aug 2015 06:52:31 -0700

Greetings to all,
I have installed and configured Hadoop2.7.0 on a linux VM. Then I
successfully ran the pre-compiled/packaged examples (e.g. PI, WordCount
etc..).
I also downloaded the Hadoop2.7.0 source code and created an eclipse
project.
I exported the WordCount jar file and tried to run the example from the
command line as follows:


> yarn jar /opt/yarn/my_examples/WordCount.jar
/user/yarn/input/wordcount.txt output


Q1: When I used the default WordCount implementation (shown in listing1),
it failed with a list of exceptions and suggestions to use the Tool and
toolRunner interfaces/utils (see errorListing1).
      I updated the code (see listing2) and included these suggested utils
and it ran successfully.
     Could you please provide an explanation as to why the application
failed to run in the first attempt and the necessity to use the
Tool/tooRunner utils?


Q2: Does this example create a yarn client implicitly and interacts with
the Yarn layer? if not, then could you please explain how the application
interacted with the hdfs layer given that the
      Yarn later is in between?


Thx and BR

Ista

[hdfs@caotclc04881 ~]$ yarn jar /opt/yarn/my_examples/WordCount.jar 
/user/yarn/input/wordcount.txt output
15/08/05 13:59:32 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/08/05 13:59:33 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
15/08/05 13:59:33 WARN mapreduce.JobResourceUploader: Hadoop command-line 
option parsing not performed. Implement the Tool interface and execute your 
application with ToolRunner to remedy this.
15/08/05 13:59:33 WARN mapreduce.JobResourceUploader: No job jar file set.  
User classes may not be found. See Job or Job#setJar(String).
15/08/05 13:59:33 INFO input.FileInputFormat: Total input paths to process : 1
15/08/05 13:59:33 INFO mapreduce.JobSubmitter: number of splits:1
15/08/05 13:59:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1437148602144_0005
15/08/05 13:59:34 INFO mapred.YARNRunner: Job jar is not present. Not adding 
any jar to the list of resources.
15/08/05 13:59:34 INFO impl.YarnClientImpl: Submitted application 
application_1437148602144_0005
15/08/05 13:59:34 INFO mapreduce.Job: The url to track the job: 
http://caotclc04881:8088/proxy/application_1437148602144_0005/
15/08/05 13:59:34 INFO mapreduce.Job: Running job: job_1437148602144_0005
15/08/05 13:59:40 INFO mapreduce.Job: Job job_1437148602144_0005 running in 
uber mode : false
15/08/05 13:59:40 INFO mapreduce.Job:  map 0% reduce 0%
15/08/05 13:59:43 INFO mapreduce.Job: Task Id : 
attempt_1437148602144_0005_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

15/08/05 13:59:47 INFO mapreduce.Job: Task Id : 
attempt_1437148602144_0005_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

15/08/05 13:59:51 INFO mapreduce.Job: Task Id : 
attempt_1437148602144_0005_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

15/08/05 13:59:57 INFO mapreduce.Job:  map 100% reduce 100%
15/08/05 13:59:57 INFO mapreduce.Job: Job job_1437148602144_0005 failed with 
state FAILED due to: Task failed task_1437148602144_0005_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/08/05 13:59:57 INFO mapreduce.Job: Counters: 9
        Job Counters 
                Failed map tasks=4
                Launched map tasks=4
                Other local map tasks=3
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=8893
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=8893
                Total vcore-seconds taken by all map tasks=8893
                Total megabyte-seconds taken by all map tasks=9106432
[hdfs@caotclc04881 ~]$ 
[hdfs@caotclc04881 ~]$ hadoop fs -ls -R /user
15/08/05 14:00:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
drwxr-xr-x   - hdfs supergroup          0 2015-08-05 13:59 /user/hdfs
drwxr-xr-x   - hdfs supergroup          0 2015-08-05 13:59 /user/hdfs/output
drwxr-xr-x   - yarn hadoop              0 2015-08-05 13:56 /user/yarn
drwxr-xr-x   - hdfs hadoop              0 2015-08-05 11:22 /user/yarn/input
-rw-r--r--   1 hdfs hadoop             31 2015-08-05 11:22 
/user/yarn/input/wordcount.txt
[hdfs@caotclc04881 ~]$ hadoop fs -rmdir /user/hdfs/output
15/08/05 14:00:43 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
[hdfs@caotclc04881 ~]$ yarn jar /opt/yarn/my_examples/WordCount.jar 
/user/yarn/input/wordcount.txt output
15/08/05 14:49:29 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/08/05 14:49:30 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
15/08/05 14:49:30 WARN mapreduce.JobResourceUploader: Hadoop command-line 
option parsing not performed. Implement the Tool interface and execute your 
application with ToolRunner to remedy this.
15/08/05 14:49:30 WARN mapreduce.JobResourceUploader: No job jar file set.  
User classes may not be found. See Job or Job#setJar(String).
15/08/05 14:49:30 INFO input.FileInputFormat: Total input paths to process : 1
15/08/05 14:49:30 INFO mapreduce.JobSubmitter: number of splits:1
15/08/05 14:49:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1437148602144_0007
15/08/05 14:49:31 INFO mapred.YARNRunner: Job jar is not present. Not adding 
any jar to the list of resources.
15/08/05 14:49:31 INFO impl.YarnClientImpl: Submitted application 
application_1437148602144_0007
15/08/05 14:49:31 INFO mapreduce.Job: The url to track the job: 
http://caotclc04881:8088/proxy/application_1437148602144_0007/
15/08/05 14:49:31 INFO mapreduce.Job: Running job: job_1437148602144_0007
15/08/05 14:49:37 INFO mapreduce.Job: Job job_1437148602144_0007 running in 
uber mode : false
15/08/05 14:49:37 INFO mapreduce.Job:  map 0% reduce 0%
15/08/05 14:49:40 INFO mapreduce.Job: Task Id : 
attempt_1437148602144_0007_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

15/08/05 14:49:44 INFO mapreduce.Job: Task Id : 
attempt_1437148602144_0007_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

15/08/05 14:49:48 INFO mapreduce.Job: Task Id : 
attempt_1437148602144_0007_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

15/08/05 14:49:54 INFO mapreduce.Job:  map 100% reduce 100%
15/08/05 14:49:54 INFO mapreduce.Job: Job job_1437148602144_0007 failed with 
state FAILED due to: Task failed task_1437148602144_0007_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/08/05 14:49:54 INFO mapreduce.Job: Counters: 9
        Job Counters 
                Failed map tasks=4
                Launched map tasks=4
                Other local map tasks=3
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=8980
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=8980
                Total vcore-seconds taken by all map tasks=8980
                Total megabyte-seconds taken by all map tasks=9195520
[hdfs@caotclc04881 ~]$ yarn jar /opt/yarn/my_examples/WordCount.jar 
/user/yarn/input/wordcount.txt output
15/08/05 15:07:51 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/08/05 15:07:51 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory hdfs://localhost:9000/user/hdfs/output already exists
        at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:269)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
        at WordCount.run(WordCount.java:56)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at WordCount.main(WordCount.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

//package org.myorg;

import java.io.IOException;
import java.util.*;
        
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;


import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
        
 


 public class WordCount2 {
        
 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
        
    public void map(LongWritable key, Text value, Context context) throws 
IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }
 } 
        
 public static class Reduce extends Reducer<Text, IntWritable, Text, 
IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }
        
 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
        
        //Job job = new Job(conf, "wordcount");
        Job job = Job.getInstance(conf, "wordcount");
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
        
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
        
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
        
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
    job.waitForCompletion(true);
 }
        
}

//package org.myorg;

import java.io.IOException;
import java.util.*;
        
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;


import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
        
public class WordCount extends Configured implements Tool{
        
        public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new WordCount(), args);
        System.exit(res);
    }
 
    @Override
    public int run(String[] args) throws Exception {
 
        // When implementing tool
        Configuration conf = this.getConf();
 
        // Create job
        //Job job = new Job(conf, "Tool Job");
        Job job = Job.getInstance(conf, "wordcount");
        job.setJarByClass(WordCount.class);
 
        // Setup MapReduce job
        // Do not specify the number of Reducer
        job.setMapperClass(Mapper.class);
        job.setReducerClass(Reducer.class); 
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
            
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
            
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
            
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
        return job.waitForCompletion(true) ? 0 : 1;
        
        
    }
//}
        
        
        
        
 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
        
    public void map(LongWritable key, Text value, Context context) throws 
IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }
 } 
        
 public static class Reduce extends Reducer<Text, IntWritable, Text, 
IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }
}

Issues and Questions on running Hadoop 2.7 Yarn MapReduce examples

Reply via email to