A question about HBase MapReduce

Florin P Thu, 24 May 2012 23:35:39 -0700

Hello!

I've read Lars George's blog 
http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where at the 
end of the article, he mentioned "In the next post I will show you how to 
import data from a raw data
file into a HBase table and how you eventually process the data in the
HBase table. We will address questions like how many mappers and/or
reducers are needed and how can I improve import and processing
performance.". I looked in the blog up for these questions, but it seems that 
there is no article related. Do you knoe if he you touched these subjects into 
a different post or book? Particular I am interested


1. how you can set up the number of mappers?
2. number of mappers can be set up per region server? If yes how?
3. How the big number of set up mappers can affect the data locality?
4. is this algorithm for computing the number of mappers 
(https://issues.apache.org/jira/browse/HBASE-1172) still available
"Currently,
the number of mappers specified when using TableInputFormat is strictly
followed if less than total regions on the input table. If greater, the
number of regions is used.
This will modify the splitting algorithm to do the following:
        * Specify 0 mappers when you want # mappers = # regions
        * If you specify fewer mappers than regions, will use exactly the 
number you specify based on the current algorithm
        * If
you specify more mappers than regions, will divide regions up by
determining [start,X) [X,end). The number of mappers will always be a
multiple of number of regions. This is so we do not have scanners
spanning multiple regions.
There is an additional issue in that the default number of mappers
in JobConf is set to 1. That means if a user does not explicitly set
number of map tasks, a single mapper will be used. "

I'll look forward for you answers. Thank you.

Kind regards, Florin

A question about HBase MapReduce

Reply via email to