Stack said he might help implement his suggestions if Eran is busy. The patch doesn't depend on recent changes to the Hadoop/MapReduce.
Give it a try. Feedback would help us refine the patch. Thanks On Tue, Apr 3, 2012 at 7:43 AM, Shawn Quinn <squ...@moxiegroup.com> wrote: > Thanks for the quick reply Ted! That's exactly what I'm looking for. > Reading through the Jira comments I'm a bit confused on what the > status/plan is with that patch. Do you expect that will be included in the > next HBase release, or has it been postponed? Also, does that change > depend on any recent changes to the Hadoop/MapReduce, or will it work > as-is? > > In the meantime, I'll give that patch a closer look and setup some custom > classes in my own project to try and pull off something similar. > > -Shawn > > On Tue, Apr 3, 2012 at 9:42 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > Take a look at HBASE-3996 where Stack has some comments outstanding. > > > > Cheers > > > > On Tue, Apr 3, 2012 at 5:52 AM, Shawn Quinn <squ...@moxiegroup.com> > wrote: > > > > > Hello, > > > > > > I have a table whose key is structured as "eventType + time", and I > need > > to > > > periodically run a map reduce job on the table which will process each > > > event type within a specific time range. So, the map reduce job needs > to > > > process multiple segments of the table as input, and therefore can't be > > > setup with a single scan. (Using a filter on the scan would > > theoretically > > > work, but doesn't scale well as the data size increases.) > > > > > > Given that the HBase provided "TableMapReduceUtil.initTableMapperJob" > > only > > > supports a single scan there doesn't appear to be a "built in" way to > > run a > > > mapreduce job that has multiple scans as input. I found the following > > > related post which points me to creating my own map reduce > "InputFormat" > > > type by extending HBase's "TableInputFormatBase" and overriding the > > > "getSplits()" method: > > > > > > > > > > > > http://stackoverflow.com/questions/4821455/hbase-mapreduce-on-multiple-scan-objects > > > > > > So, that's currently the direction I'm heading. However, before I got > > too > > > far in the weeds I thought I'd ask: > > > > > > 1. Is this still the best/right way to handle this situation? > > > > > > 2. Does anyone have an example of a custom InputFormat that sets up > > > multiple scans against an HBase input table (something like the > > > "MultiSegmentTableInputFormat" referred to in the post) that they'd be > > > willing to share? > > > > > > Thanks, > > > > > > -Shawn > > > > > >