Re: Obtain many mappers (or regions)

Florin P Wed, 29 Jun 2011 21:52:21 -0700

Hello!
  Thank you for your responses. For our needs we have implemented our custom 
TableInputFormat and overriding the method getSplits(). 
By the way, how you can create via Java API, a number of regions (or with your 
words "pre-splits")? I have read about them but I would like to see an example 
of such usage.
  Thank you. 
 Florin



--- On Mon, 6/27/11, Michel Segel <[email protected]> wrote:

> From: Michel Segel <[email protected]>
> Subject: Re: Obtain many mappers (or regions)
> To: "[email protected]" <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Date: Monday, June 27, 2011, 12:22 PM
> Just a simple suggestion that will
> make your life a bit easier...
> 
> If your data is relatively small, small enough that you can
> easily fit the result set in to memory...
> You may want to do the following...
> Oozie calls your map/reduce job.
> At the start of your m/r job, you connect from the client
> to hbase and read the result set in to a list object. (or
> something similar). You then write a custom input format
> class that uses a list object as its input. You can then
> split the input as you need it.
> 
> Much easier than trying to pre split temporary tables and a
> lot less work and overhead.
> 
> This is something that could be part of an indexing
> solution. ;-P
> (meaning that the classes are reusable for other
> solutions...)
> 
> HTH -Mike
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Jun 27, 2011, at 7:46 AM, Florin P <[email protected]>
> wrote:
> 
> > Hi!
> >  Thank you for your response. As I said, it is a
> temporary table. This table acts as a metadata for long
> tasks processing that we would like to trigger from the
> cluster (as map/reduce jobs) in order that all machines to
> take some of that tasks.
> >  I have read the indicated chapter, and then I
> have followed the scenario:
> >   1.We have loaded the small data into
> the hbase table
> >   2. From the hbase admin interface we
> triggered the split action
> >   3. We have seen that 32 new regions
> were created for that table
> >   4. We have ran a map/reduce job that
> counts the number of rows
> >   5. Only two mappers were created
> > What is puzzles me is that only 2 mapper tasks were
> created, even in the indicated book it is stated that
> > (cite)"
> > When TableInputFormat, is used to source an HBase
> table in a MapReduce job, its splitter will make a map task
> for each region of the table. Thus, if there are 100 regions
> in the table, there will be 100 map-tasks for the job -
> regardless of how many column families are selected in the
> Scan.
> > "  
> > 
> > Can you please explain why this is happen? Did we miss
> some property configuration?
> > 
> > Thank you.
> > regards,
> >  Florin
> > --- On Mon, 6/27/11, Doug Meil <[email protected]>
> wrote:
> > 
> >> From: Doug Meil <[email protected]>
> >> Subject: RE: Obtain many mappers (or regions)
> >> To: "[email protected]"
> <[email protected]>
> >> Date: Monday, June 27, 2011, 8:01 AM
> >> Hi there-
> >> 
> >> If you only have 100 rows I think that HBase might
> be
> >> overkill.
> >> 
> >> You probably want to start with this to get a
> background on
> >> what HBase can do...
> >> http://hbase.apache.org/book.html
> >> .. there is a section on MapReduce with HBase as
> well.
> >> 
> >> -----Original Message-----
> >> From: Florin P [mailto:[email protected]]
> >> 
> >> Sent: Monday, June 27, 2011 4:53 AM
> >> To: [email protected]
> >> Subject: Obtain many mappers (or regions)
> >> 
> >> Hello!
> >> I have the following scenario:
> >> 1. A temporary HBase table with small number of
> rows (aprox
> >> 100) 2. A cluster with 2 machines that I would
> like to
> >> crunch the data contained in the rows  
> >> 
> >> I would like to create two mappers that will
> crunch the
> >> data from rows. 
> >> How can I achieve this?
> >> A general question is: 
> >>   how we can obtain many mappers to
> crunch small data
> >> quantity?
> >> 
> >> Thank you.
> >>   Regards,
> >>   Florin  
> >> 
> > 
>

Re: Obtain many mappers (or regions)

Reply via email to