you can configure your third mapreduce job using MultipleFileInput and read
those file into you job. if the file size is small then you can consider
the DistributedCache which will give you an optimal performance if you are
joining the datasets of file1 and file2. I will also recommend you to use
Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Regards,
Shahab
On Mon, Jan 5, 2015
Hitarth,
I don't know how much direction you are looking for with regards to the
formats of the times but you can certainly read both files into the third
mapreduce job using the FileInputFormat by comma-separating the paths to
the files. The blocks for both files will essentially be unioned
Hitarth:
You can also consider MultiFileInputFormat (and its concrete
implementations).
Cheers
On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet cjno...@gmail.com wrote:
Hitarth,
I don't know how much direction you are looking for with regards to the
formats of the times but you can certainly
Hi hitarth
,
If your file1 and file 2 is smaller you can move on with Distributed Cache.
mentioned here
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
.
Or you can move on with MultipleInputFormat
mentioned here
Hi,
I have 6 node cluster, and the scenario is as follows :-
I have one map reduce job which will write file1 in HDFS.
I have another map reduce job which will write file2 in HDFS.
In the third map reduce job I need to use file1 and file2 to do some
computation and output the value.
What is