Thanks for updating the list w/ your fix, Florin. St.Ack
On Mon, Jun 27, 2011 at 7:44 AM, Florin P <[email protected]> wrote: > Hello! > I've found the problem with the number of mappers. We are running the M/R > jobs with Oozie that apparently ignores the set up of the mapred.map.tasks > property that is used as a hint for computing the number of splits. Cite from > the TableInputFormatBase#getSplits(old API) java doc: > "Splits are created in number equal to the smallest between numSplits and > the number of {@link HRegion}s in the table. If the number of splits is > smaller than the number of {@link HRegion}s then splits are spanned across > multiple {@link HRegion}s and are grouped the most evenly possible. In the > case splits are uneven the bigger splits are placed first in the > {@link InputSplit} array." > > By default the mapred.map.tasks is set up to 2. Applying the above algorithm > on my scenario (and the oozie observation), computing > min(mapred.map.tasks=2,number_of_my_regions=32) then we obtain the "magic" > number of mappers 2. > We have observed this behavior, by implementing a Driver for the MR job and > setting up the mapred.map.tasks to 40 let's say. Then the number of mappers > are calculated correctly to 32. > Regards, > Florin > > > > --- On Mon, 6/27/11, Florin P <[email protected]> wrote: > >> From: Florin P <[email protected]> >> Subject: RE: Obtain many mappers (or regions) >> To: [email protected] >> Date: Monday, June 27, 2011, 8:46 AM >> Hi! >> Thank you for your response. As I said, it is a >> temporary table. This table acts as a metadata for long >> tasks processing that we would like to trigger from the >> cluster (as map/reduce jobs) in order that all machines to >> take some of that tasks. >> I have read the indicated chapter, and then I have >> followed the scenario: >> 1.We have loaded the small data into the >> hbase table >> 2. From the hbase admin interface we >> triggered the split action >> 3. We have seen that 32 new regions were >> created for that table >> 4. We have ran a map/reduce job that >> counts the number of rows >> 5. Only two mappers were created >> What is puzzles me is that only 2 mapper tasks were >> created, even in the indicated book it is stated that >> (cite)" >> When TableInputFormat, is used to source an HBase table in >> a MapReduce job, its splitter will make a map task for each >> region of the table. Thus, if there are 100 regions in the >> table, there will be 100 map-tasks for the job - regardless >> of how many column families are selected in the Scan. >> " >> >> Can you please explain why this is happen? Did we miss some >> property configuration? >> >> Thank you. >> regards, >> Florin >> --- On Mon, 6/27/11, Doug Meil <[email protected]> >> wrote: >> >> > From: Doug Meil <[email protected]> >> > Subject: RE: Obtain many mappers (or regions) >> > To: "[email protected]" >> <[email protected]> >> > Date: Monday, June 27, 2011, 8:01 AM >> > Hi there- >> > >> > If you only have 100 rows I think that HBase might be >> > overkill. >> > >> > You probably want to start with this to get a >> background on >> > what HBase can do... >> > http://hbase.apache.org/book.html >> > .. there is a section on MapReduce with HBase as >> well. >> > >> > -----Original Message----- >> > From: Florin P [mailto:[email protected]] >> > >> > Sent: Monday, June 27, 2011 4:53 AM >> > To: [email protected] >> > Subject: Obtain many mappers (or regions) >> > >> > Hello! >> > I have the following scenario: >> > 1. A temporary HBase table with small number of rows >> (aprox >> > 100) 2. A cluster with 2 machines that I would like >> to >> > crunch the data contained in the rows >> > >> > I would like to create two mappers that will crunch >> the >> > data from rows. >> > How can I achieve this? >> > A general question is: >> > how we can obtain many mappers to crunch small >> data >> > quantity? >> > >> > Thank you. >> > Regards, >> > Florin >> > >> >
