Hi all,
I have a maybe naive question on providing input to a mapreduce program:
how can I specify the input with respect to the hdfs path?
right now I can specify a input file from my local directory, say, hadoop
trunk
I can also specify an absolute path for a dfs file using where it is
First, you need to point a MapReduce job at a directory, not an individual
file. Second, when you specify a path in your job conf, using the Path
object, that path you supply is a HDFS path, not a local path.
Yes, you can use the output files of another MapReduce job as input for a
second job,
I wonder if I am missing something.
I have a .txt file for input, and I placed it under the input directory of
hdfs.
Then I called
FileInputFormat.setInputPaths(c, new Path(input));
and I got an error:
Exception in thread main
org.apache.hadoop.mapred.InvalidInputException: Input path