Hello! I've read Lars George's blog http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where at the end of the article, he mentioned "In the next post I will show you how to import data from a raw data file into a HBase table and how you eventually process the data in the HBase table. We will address questions like how many mappers and/or reducers are needed and how can I improve import and processing performance.". I looked in the blog up for these questions, but it seems that there is no article related. Do you knoe if he you touched these subjects into a different post or book? Particular I am interested
1. how you can set up the number of mappers? 2. number of mappers can be set up per region server? If yes how? 3. How the big number of set up mappers can affect the data locality? 4. is this algorithm for computing the number of mappers (https://issues.apache.org/jira/browse/HBASE-1172) still available "Currently, the number of mappers specified when using TableInputFormat is strictly followed if less than total regions on the input table. If greater, the number of regions is used. This will modify the splitting algorithm to do the following: * Specify 0 mappers when you want # mappers = # regions * If you specify fewer mappers than regions, will use exactly the number you specify based on the current algorithm * If you specify more mappers than regions, will divide regions up by determining [start,X) [X,end). The number of mappers will always be a multiple of number of regions. This is so we do not have scanners spanning multiple regions. There is an additional issue in that the default number of mappers in JobConf is set to 1. That means if a user does not explicitly set number of map tasks, a single mapper will be used. " I'll look forward for you answers. Thank you. Kind regards, Florin