Re: Real Multiple Outputs for Hadoop -- is this implementation correct?

2013-09-13 Thread Harsh J
I took a very brief look, and the approach to use multiple OCs, one per unique parent path from a task, seems the right thing to do. Nice work! Do consider contributing this if its working well for you :) On Sat, Sep 14, 2013 at 12:53 AM, Paul Houle ontolo...@gmail.com wrote: Hey guys I spent

Real Multiple Outputs for Hadoop -- is this implementation correct?

2013-09-13 Thread Paul Houle
Hey guys I spent some time last week thinking about Hadoop before I wrote my own class, RealMultipleOutputs, that does something like what MultipleOutputs does, except that you can specify different hdfs paths for the different output streams. My pals were telling me to use Cascading or Pig