Processing rows in parallel with MapReduce jobs.

Ivan Cores gonzalez Mon, 11 Apr 2016 05:50:00 -0700

Hi all, 

I have a small question regarding the MapReduce jobs behaviour with HBase.


I have a HBase test table with only 8 rows. I splitted the table with the hbase 
shell 
split command into 2 splits. So now there are 4 rows in every split. 

I create a MapReduce job that only prints the row key in the log files. 
When I run the MapReduce job, every row is processed by 1 mapper. But the 
mappers 
in the same split are executed sequentially (inside the same container). That 
means, 
the first four rows are processed sequentially by 4 mappers. The system has 
cores 
that are free, so is it possible to process rows in parallel if they are 
located 
in the same split? 

The only way I found to have 8 mappers executed in parallel is split the table 
in 8 splits (1 split per row). But obviously this is not the best solution for 
big tables ... 

Thanks, 
Ivan.

Processing rows in parallel with MapReduce jobs.

Reply via email to