Mapper and Reducer takes longer than usual for a HBase table aggregation task

Pavan Sudheendra Sun, 25 Aug 2013 13:38:41 -0700

Hi all,

My mapper function is processing and aggregating 3 HBase table's data and
writing it to the reducer for further operations..


However, all the 3 tables have small number of rows.. Not in the order of
millions.. Still my map task completes in

16:07:29,632  INFO JobClient:1435 - Running job:
job_201308231255_005716:07:30,640  INFO JobClient:1448 -  map 0%
reduce 0%16:42:02,778  INFO JobClient:1448 -  map 100% reduce
0%16:42:11,793  INFO JobClient:1448 -  map 100% reduce 67%16:43:51,959
 INFO JobClient:1448 -  map 100% reduce 68%16:46:28,278  INFO
JobClient:1448 -  map 100% reduce 69%16:48:44,497  INFO JobClient:1448
-  map 100% reduce 70%16:50:51,698  INFO JobClient:1448 -  map 100%
reduce 71%16:52:55,885  INFO JobClient:1448 -  map 100% reduce
72%16:55:42,141  INFO JobClient:1448 -  map 100% reduce
73%16:58:24,384  INFO JobClient:1448 -  map 100% reduce
74%17:00:58,614  INFO JobClient:1448 -  map 100% reduce
75%17:03:36,849  INFO JobClient:1448 -  map 100% reduce
100%17:03:38,853  INFO JobClient:1503 - Job complete:
job_201308231255_005717:03:38,869  INFO JobClient:566 - Counters:
3217:03:38,873  INFO JobClient:568 -   File System
Counters17:03:38,876  INFO JobClient:570 -     FILE: Number of bytes
read=225315717:03:38,876  INFO JobClient:570 -     FILE: Number of
bytes written=493611617:03:38,877  INFO JobClient:570 -     FILE:
Number of read operations=017:03:38,877  INFO JobClient:570 -
FILE: Number of large read operations=017:03:38,877  INFO
JobClient:570 -     FILE: Number of write operations=017:03:38,877
INFO JobClient:570 -     HDFS: Number of bytes read=11617:03:38,877
INFO JobClient:570 -     HDFS: Number of bytes written=017:03:38,878
INFO JobClient:570 -     HDFS: Number of read operations=117:03:38,878
 INFO JobClient:570 -     HDFS: Number of large read
operations=017:03:38,878  INFO JobClient:570 -     HDFS: Number of
write operations=017:03:38,881  INFO JobClient:568 -   Job Counters
17:03:38,882  INFO JobClient:570 -     Launched map
tasks=117:03:38,882  INFO JobClient:570 -     Launched reduce
tasks=117:03:38,882  INFO JobClient:570 -     Data-local map
tasks=117:03:38,882  INFO JobClient:570 -     Total time spent by all
maps in occupied slots (ms)=206626217:03:38,882  INFO JobClient:570 -
   Total time spent by all reduces in occupied slots
(ms)=129324317:03:38,883  INFO JobClient:570 -     Total time spent by
all maps waiting after reserving slots (ms)=017:03:38,883  INFO
JobClient:570 -     Total time spent by all reduces waiting after
reserving slots (ms)=017:03:38,886  INFO JobClient:568 -   Map-Reduce
Framework17:03:38,886  INFO JobClient:570 -     Map input
records=8281817:03:38,886  INFO JobClient:570 -     Map output
records=8281817:03:38,886  INFO JobClient:570 -     Map output
bytes=850491517:03:38,886  INFO JobClient:570 -     Input split
bytes=11617:03:38,887  INFO JobClient:570 -     Combine input
records=017:03:38,887  INFO JobClient:570 -     Combine output
records=017:03:38,887  INFO JobClient:570 -     Reduce input
groups=8270617:03:38,887  INFO JobClient:570 -     Reduce shuffle
bytes=225315317:03:38,887  INFO JobClient:570 -     Reduce input
records=8281817:03:38,888  INFO JobClient:570 -     Reduce output
records=8270617:03:38,888  INFO JobClient:570 -     Spilled
Records=16563617:03:38,888  INFO JobClient:570 -     CPU time spent
(ms)=320136017:03:38,888  INFO JobClient:570 -     Physical memory
(bytes) snapshot=109038796817:03:38,888  INFO JobClient:570 -
Virtual memory (bytes) snapshot=668360704017:03:38,889  INFO
JobClient:570 -     Total committed heap usage
(bytes)=48732569617:03:38,890  INFO ActionDataInterpret:595 - Map Job
is Completed


This is a lot longer than what i expected.. 1 hour is just too slow.. Can i
improve it? We have a 6 node cluster running on EC2 at the moment.

Another Question, why does it indicate number of mappers as 1? Can i change
it so that multiple mappers perform computation?

2. ) If my table is in the order of millions, the number of mappers is
increased to 5.. How does Hadoop know how many mappers to run for a
specific job?

-- 
Regards-
Pavan

Mapper and Reducer takes longer than usual for a HBase table aggregation task

Reply via email to