Re: Accumulo and Mapreduce

2013-03-04 Thread Nick Dimiduk
As Ted said, my first choice would be Cascading. Second choice would be ChainMapper. As you'll see in those search results [0], it's not available in the "modern" mapreduce API consistently across Hadoop releases. If you've already implemented this against the mapred API, go doe ChainReducer. If yo

Re: Accumulo and Mapreduce

2013-03-04 Thread Aji Janis
I was considering based on earlier discussions using a JobController or ChainMapper to do this. But like a few of you mentioned Pig, Cascade or Oozie might be better. So what are the use cases for them? How do I decide which one works best for what? Thank you all for your feedback. On Mon, Mar

Re: Accumulo and Mapreduce

2013-03-04 Thread Ted Dunning
Chaining the jobs is a fantastically inefficient solution. If you use Pig or Cascading, the optimizer will glue all of your map functions into a single mapper. The result is something like: (mapper1 -> mapper2 -> mapper3) => reducer Here the parentheses indicate that all of the map function

Re: Accumulo and Mapreduce

2013-03-04 Thread Russell Jurney
You can chain MR jobs with Oozie, but would suggest using Cascading, Pig or Hive. You can do this is a couple lines of code, I suspect. Two map reduce jobs should not pose any kind of challenge with the right tools. On Monday, March 4, 2013, Sandy Ryza wrote: > Hi Aji, > > Oozie is a mature proje

Re: Accumulo and Mapreduce

2013-03-04 Thread Sandy Ryza
Hi Aji, Oozie is a mature project for managing MapReduce workflows. http://oozie.apache.org/ -Sandy On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody wrote: > Aji, > > Why don't you just chain the jobs together? > http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining > > Justin > > On M

Re: Accumulo and Mapreduce

2013-03-04 Thread Justin Woody
Aji, Why don't you just chain the jobs together? http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining Justin On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis wrote: > Russell thanks for the link. > > I am interested in finding a solution (if out there) where Mapper1 outputs a > custom obj

Re: Accumulo and Mapreduce

2013-03-04 Thread Aji Janis
Russell thanks for the link. I am interested in finding a solution (if out there) where Mapper1 outputs a custom object and Mapper 2 can use that as input. One way to do this obviously by writing to Accumulo, in my case. But, is there another solution for this: List > Input to Job MyObject -

Re: Accumulo and Mapreduce

2013-03-04 Thread Russell Jurney
http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try it. Russell Jurney http://datasyndrome.com On Mar 4, 2013, at 5:30 AM, Aji Janis wrote: Hello, I have