Re: HBase MapReduce Job with Multiple Scans

Ted Yu Tue, 03 Apr 2012 07:46:13 -0700

Stack said he might help implement his suggestions if Eran is busy.

The patch doesn't depend on recent changes to the Hadoop/MapReduce.


Give it a try. Feedback would help us refine the patch.

Thanks

On Tue, Apr 3, 2012 at 7:43 AM, Shawn Quinn <squ...@moxiegroup.com> wrote:

> Thanks for the quick reply Ted!  That's exactly what I'm looking for.
> Reading through the Jira comments I'm a bit confused on what the
> status/plan is with that patch.  Do you expect that will be included in the
> next HBase release, or has it been postponed?  Also, does that change
> depend on any recent changes to the Hadoop/MapReduce, or will it work
> as-is?
>
> In the meantime, I'll give that patch a closer look and setup some custom
> classes in my own project to try and pull off something similar.
>
>     -Shawn
>
> On Tue, Apr 3, 2012 at 9:42 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Take a look at HBASE-3996 where Stack has some comments outstanding.
> >
> > Cheers
> >
> > On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn <squ...@moxiegroup.com>
> wrote:
> >
> > > Hello,
> > >
> > > I have a table whose key is structured as "eventType + time", and I
> need
> > to
> > > periodically run a map reduce job on the table which will process each
> > > event type within a specific time range.  So, the map reduce job needs
> to
> > > process multiple segments of the table as input, and therefore can't be
> > > setup with a single scan.  (Using a filter on the scan would
> > theoretically
> > > work, but doesn't scale well as the data size increases.)
> > >
> > > Given that the HBase provided "TableMapReduceUtil.initTableMapperJob"
> > only
> > > supports a single scan there doesn't appear to be a "built in" way to
> > run a
> > > mapreduce job that has multiple scans as input.  I found the following
> > > related post which points me to creating my own map reduce
> "InputFormat"
> > > type by extending HBase's "TableInputFormatBase" and overriding the
> > > "getSplits()" method:
> > >
> > >
> > >
> >
> http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects
> > >
> > > So, that's currently the direction I'm heading.  However, before I got
> > too
> > > far in the weeds I thought I'd ask:
> > >
> > > 1. Is this still the best/right way to handle this situation?
> > >
> > > 2. Does anyone have an example of a custom InputFormat that sets up
> > > multiple scans against an HBase input table (something like the
> > > "MultiSegmentTableInputFormat" referred to in the post) that they'd be
> > > willing to share?
> > >
> > > Thanks,
> > >
> > >       -Shawn
> > >
> >
>

Re: HBase MapReduce Job with Multiple Scans

Reply via email to