As Ted said, my first choice would be Cascading. Second choice would be
ChainMapper. As you'll see in those search results [0], it's not available
in the "modern" mapreduce API consistently across Hadoop releases. If
you've already implemented this against the mapred API, go doe
ChainReducer. If yo
I was considering based on earlier discussions using a JobController or
ChainMapper to do this. But like a few of you mentioned Pig, Cascade or
Oozie might be better. So what are the use cases for them? How do I decide
which one works best for what?
Thank you all for your feedback.
On Mon, Mar
Chaining the jobs is a fantastically inefficient solution. If you use Pig
or Cascading, the optimizer will glue all of your map functions into a
single mapper. The result is something like:
(mapper1 -> mapper2 -> mapper3) => reducer
Here the parentheses indicate that all of the map function
You can chain MR jobs with Oozie, but would suggest using Cascading, Pig or
Hive. You can do this is a couple lines of code, I suspect. Two map reduce
jobs should not pose any kind of challenge with the right tools.
On Monday, March 4, 2013, Sandy Ryza wrote:
> Hi Aji,
>
> Oozie is a mature proje
Hi Aji,
Oozie is a mature project for managing MapReduce workflows.
http://oozie.apache.org/
-Sandy
On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody wrote:
> Aji,
>
> Why don't you just chain the jobs together?
> http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
>
> Justin
>
> On M
Aji,
Why don't you just chain the jobs together?
http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
Justin
On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis wrote:
> Russell thanks for the link.
>
> I am interested in finding a solution (if out there) where Mapper1 outputs a
> custom obj
Russell thanks for the link.
I am interested in finding a solution (if out there) where Mapper1 outputs
a custom object and Mapper 2 can use that as input. One way to do this
obviously by writing to Accumulo, in my case. But, is there another
solution for this:
List > Input to Job
MyObject -
http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try it.
Russell Jurney http://datasyndrome.com
On Mar 4, 2013, at 5:30 AM, Aji Janis wrote:
Hello,
I have