Hello!
  Thank you for your responses. For our needs we have implemented our custom 
TableInputFormat and overriding the method getSplits(). 
By the way, how you can create via Java API, a number of regions (or with your 
words "pre-splits")? I have read about them but I would like to see an example 
of such usage.
  Thank you. 
 Florin


--- On Mon, 6/27/11, Michel Segel <[email protected]> wrote:

> From: Michel Segel <[email protected]>
> Subject: Re: Obtain many mappers (or regions)
> To: "[email protected]" <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Date: Monday, June 27, 2011, 12:22 PM
> Just a simple suggestion that will
> make your life a bit easier...
> 
> If your data is relatively small, small enough that you can
> easily fit the result set in to memory...
> You may want to do the following...
> Oozie calls your map/reduce job.
> At the start of your m/r job, you connect from the client
> to hbase and read the result set in to a list object. (or
> something similar). You then write a custom input format
> class that uses a list object as its input. You can then
> split the input as you need it.
> 
> Much easier than trying to pre split temporary tables and a
> lot less work and overhead.
> 
> This is something that could be part of an indexing
> solution. ;-P
> (meaning that the classes are reusable for other
> solutions...)
> 
> HTH -Mike
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Jun 27, 2011, at 7:46 AM, Florin P <[email protected]>
> wrote:
> 
> > Hi!
> >  Thank you for your response. As I said, it is a
> temporary table. This table acts as a metadata for long
> tasks processing that we would like to trigger from the
> cluster (as map/reduce jobs) in order that all machines to
> take some of that tasks.
> >  I have read the indicated chapter, and then I
> have followed the scenario:
> >   1.We have loaded the small data into
> the hbase table
> >   2. From the hbase admin interface we
> triggered the split action
> >   3. We have seen that 32 new regions
> were created for that table
> >   4. We have ran a map/reduce job that
> counts the number of rows
> >   5. Only two mappers were created
> > What is puzzles me is that only 2 mapper tasks were
> created, even in the indicated book it is stated that
> > (cite)"
> > When TableInputFormat, is used to source an HBase
> table in a MapReduce job, its splitter will make a map task
> for each region of the table. Thus, if there are 100 regions
> in the table, there will be 100 map-tasks for the job -
> regardless of how many column families are selected in the
> Scan.
> > "  
> > 
> > Can you please explain why this is happen? Did we miss
> some property configuration?
> > 
> > Thank you.
> > regards,
> >  Florin
> > --- On Mon, 6/27/11, Doug Meil <[email protected]>
> wrote:
> > 
> >> From: Doug Meil <[email protected]>
> >> Subject: RE: Obtain many mappers (or regions)
> >> To: "[email protected]"
> <[email protected]>
> >> Date: Monday, June 27, 2011, 8:01 AM
> >> Hi there-
> >> 
> >> If you only have 100 rows I think that HBase might
> be
> >> overkill.
> >> 
> >> You probably want to start with this to get a
> background on
> >> what HBase can do...
> >> http://hbase.apache.org/book.html
> >> .. there is a section on MapReduce with HBase as
> well.
> >> 
> >> -----Original Message-----
> >> From: Florin P [mailto:[email protected]]
> >> 
> >> Sent: Monday, June 27, 2011 4:53 AM
> >> To: [email protected]
> >> Subject: Obtain many mappers (or regions)
> >> 
> >> Hello!
> >> I have the following scenario:
> >> 1. A temporary HBase table with small number of
> rows (aprox
> >> 100) 2. A cluster with 2 machines that I would
> like to
> >> crunch the data contained in the rows  
> >> 
> >> I would like to create two mappers that will
> crunch the
> >> data from rows. 
> >> How can I achieve this?
> >> A general question is: 
> >>   how we can obtain many mappers to
> crunch small data
> >> quantity?
> >> 
> >> Thank you.
> >>   Regards,
> >>   Florin  
> >> 
> > 
>

Reply via email to