This question is pertaining to hadoop with core version 1.2.1 and hbase
1.2.3.

I wrote a simple map/reduce job that looks like this:

- The input to the mapper is whole HDFS files at a time, via a custom
InputFormat
- The output of the mapper is <LongWriteable, Text>

The job is configured like this:

    Configuration conf = getConf();

    JobConf cfg = new JobConf(conf, FindMissionTimeJob.class);

    /* Set up mapper */
    Path inputPath = new Path(args[0]);
    WholeFileInputFormat.setInputPaths(cfg, inputPath);
    cfg.setNumMapTasks(1);
    cfg.setMapperClass(TimeMapper.class);
    cfg.setInputFormat(WholeFileInputFormat.class);
    cfg.setMapOutputKeyClass(LongWritable.class);
    cfg.setMapOutputValueClass(Text.class);

    cfg.setNumReduceTasks(1);
    TableMapReduceUtil.initTableReduceJob(tableName, TimeReducer.class,
cfg);


When I run it, I get an NPE here:

16/10/07 15:33:55 INFO mapreduce.Job: Task Id :
attempt_1475608557171_0021_r_000000_2, Status : FAILED

Error: java.lang.NullPointerException
        at org.apache.hadoop.mapred.Task.getFsStatistics(Task.java:373)
        at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:478)
        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)


Tracing the source of that, it really comes down to this bit of code in
ReduceTask.java that is failing:

  if (job.getOutputFormat() instanceof FileOutputFormat) {

      matchedStats = getFsStatistics(FileOutputFormat.getOutputPath(job),
job);

  }


getFsStatistics() throws an NPE becaue there is no output path.  The
problem is, we shoud not even get there, because initTableReduceJob()
already set the output format to this::

    job.setOutputFormat(TableOutputFormat.class);


Looking at the source for that guy:

    public class TableOutputFormat extends
FileOutputFormat<ImmutableBytesWritable, Put>

Ahh hah.  There's the problem.  That extends FileOutputFormat, which causes
everything to blow up because it's a table reducer.  It's not really a File
output, but getFsStatistics is invoked and subsequently blows up.  It seems
likely that getFsStatistics should check for a NULL here, and perhaps that
TableOutputFormat probably should not extend FileOutputFormat - I'm not
sure.


The way I got around this was to create my own OutputFormat called
MyTableOutputFormat that was essentially a copy of the real
TableOutputFormat but instead of extending FIleOutputFormat I did this:

    public class MyTableOutputFormat implements
OutputFormat<ImmutableBytesWritable, Put> { ... }


I don't know if that's a correct or incorrect solution... just that it runs
an empty hbase-outputting reducer without blowing up on an NPE.

I would appreciate any comments/feedback on the problem as well as my
workaround, and whether or not anybody else has encountered this.

--Chris

Reply via email to