Re: Obtain many mappers (or regions)

Stack Mon, 27 Jun 2011 14:50:47 -0700

Thanks for updating the list w/ your fix, Florin.
St.Ack


On Mon, Jun 27, 2011 at 7:44 AM, Florin P <[email protected]> wrote:
> Hello!
>  I've found the problem with the number of mappers. We are running the M/R 
> jobs with Oozie that apparently ignores the set up of the mapred.map.tasks 
> property that is used as a hint for computing the number of splits. Cite from 
> the TableInputFormatBase#getSplits(old API) java doc:
> "Splits are created in number equal to the smallest between numSplits and
>    the number of {@link HRegion}s in the table. If the number of splits is
>    smaller than the number of {@link HRegion}s then splits are spanned across
>    multiple {@link HRegion}s and are grouped the most evenly possible. In the
>    case splits are uneven the bigger splits are placed first in the
>    {@link InputSplit} array."
>
> By default the mapred.map.tasks is set up to 2. Applying the above algorithm 
> on my scenario (and the oozie observation), computing  
> min(mapred.map.tasks=2,number_of_my_regions=32) then we obtain the "magic" 
> number of mappers 2.
>  We have observed this behavior, by implementing a Driver for the MR job and 
> setting up the mapred.map.tasks to 40 let's say. Then the number of mappers 
> are calculated correctly to 32.
>  Regards,
>   Florin
>
>
>
> --- On Mon, 6/27/11, Florin P <[email protected]> wrote:
>
>> From: Florin P <[email protected]>
>> Subject: RE: Obtain many mappers (or regions)
>> To: [email protected]
>> Date: Monday, June 27, 2011, 8:46 AM
>> Hi!
>>   Thank you for your response. As I said, it is a
>> temporary table. This table acts as a metadata for long
>> tasks processing that we would like to trigger from the
>> cluster (as map/reduce jobs) in order that all machines to
>> take some of that tasks.
>>   I have read the indicated chapter, and then I have
>> followed the scenario:
>>    1.We have loaded the small data into the
>> hbase table
>>    2. From the hbase admin interface we
>> triggered the split action
>>    3. We have seen that 32 new regions were
>> created for that table
>>    4. We have ran a map/reduce job that
>> counts the number of rows
>>    5. Only two mappers were created
>> What is puzzles me is that only 2 mapper tasks were
>> created, even in the indicated book it is stated that
>>  (cite)"
>> When TableInputFormat, is used to source an HBase table in
>> a MapReduce job, its splitter will make a map task for each
>> region of the table. Thus, if there are 100 regions in the
>> table, there will be 100 map-tasks for the job - regardless
>> of how many column families are selected in the Scan.
>> "
>>
>> Can you please explain why this is happen? Did we miss some
>> property configuration?
>>
>> Thank you.
>>  regards,
>>   Florin
>> --- On Mon, 6/27/11, Doug Meil <[email protected]>
>> wrote:
>>
>> > From: Doug Meil <[email protected]>
>> > Subject: RE: Obtain many mappers (or regions)
>> > To: "[email protected]"
>> <[email protected]>
>> > Date: Monday, June 27, 2011, 8:01 AM
>> > Hi there-
>> >
>> > If you only have 100 rows I think that HBase might be
>> > overkill.
>> >
>> > You probably want to start with this to get a
>> background on
>> > what HBase can do...
>> > http://hbase.apache.org/book.html
>> > .. there is a section on MapReduce with HBase as
>> well.
>> >
>> > -----Original Message-----
>> > From: Florin P [mailto:[email protected]]
>> >
>> > Sent: Monday, June 27, 2011 4:53 AM
>> > To: [email protected]
>> > Subject: Obtain many mappers (or regions)
>> >
>> > Hello!
>> > I have the following scenario:
>> > 1. A temporary HBase table with small number of rows
>> (aprox
>> > 100) 2. A cluster with 2 machines that I would like
>> to
>> > crunch the data contained in the rows
>> >
>> > I would like to create two mappers that will crunch
>> the
>> > data from rows.
>> >  How can I achieve this?
>> > A general question is:
>> >   how we can obtain many mappers to crunch small
>> data
>> > quantity?
>> >
>> > Thank you.
>> >   Regards,
>> >   Florin
>> >
>>
>

Re: Obtain many mappers (or regions)

Reply via email to