Is it possible to connect the output of one map reduce job so that it is the
input to another map reduce job.
Basically… then reduce() outputs a key, that will be passed to another map()
function without having to store intermediate data to the filesystem.
Kevin
--
Founder/CEO Spinn3r.com
Loc
Hi,
I am not sure how you can avoid the filesystem, however, I did it as follows:
// For Job 1
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path(args[1]));
// For job 2
FileInputFormat.addInputPath(job2, new Path(args[1]));
FileOutputFormat.setO
Are you consider for this to Oozie? It´s a workflow engine developed for the
Yahoo! engineers
Yahoo/oozie at GitHub
https://github.com/yahoo/oozie
Oozie at InfoQ
http://www.infoq.com/articles/introductionOozie
Oozie´s examples:
http://www.infoq.com/articles/oozieexample
http://yahoo.github.com/oo
On Sep 27, 2011, at 12:09 PM, Kevin Burton wrote:
> Is it possible to connect the output of one map reduce job so that it is the
> input to another map reduce job.
>
> Basically… then reduce() outputs a key, that will be passed to another map()
> function without having to store intermediate d
It looks to me like Oozie will not do what was asked. In
http://yahoo.github.com/oozie/releases/3.0.0/WorkflowFunctionalSpec.html#a0_Definitions
I see:
3.2.2 Map-Reduce Action
...
The workflow job will wait until the Hadoop map/reduce job completes
before continuing to the next action in the
To me it sounds like the asker should checkout tools like storm and s4
instead of hadoop.
http://www.infoq.com/news/2011/09/twitter-storm-real-time-hadoop
--
Met vriendelijke groet,
Niels Basjes
Op 27 sep. 2011 22:38 schreef "Mike Spreitzer" het
volgende:
> It looks to me like Oozie will not do