Hi all, My mapper function is processing and aggregating 3 HBase table's data and writing it to the reducer for further operations..
However, all the 3 tables have small number of rows.. Not in the order of millions.. Still my map task completes in 16:07:29,632 INFO JobClient:1435 - Running job: job_201308231255_005716:07:30,640 INFO JobClient:1448 - map 0% reduce 0%16:42:02,778 INFO JobClient:1448 - map 100% reduce 0%16:42:11,793 INFO JobClient:1448 - map 100% reduce 67%16:43:51,959 INFO JobClient:1448 - map 100% reduce 68%16:46:28,278 INFO JobClient:1448 - map 100% reduce 69%16:48:44,497 INFO JobClient:1448 - map 100% reduce 70%16:50:51,698 INFO JobClient:1448 - map 100% reduce 71%16:52:55,885 INFO JobClient:1448 - map 100% reduce 72%16:55:42,141 INFO JobClient:1448 - map 100% reduce 73%16:58:24,384 INFO JobClient:1448 - map 100% reduce 74%17:00:58,614 INFO JobClient:1448 - map 100% reduce 75%17:03:36,849 INFO JobClient:1448 - map 100% reduce 100%17:03:38,853 INFO JobClient:1503 - Job complete: job_201308231255_005717:03:38,869 INFO JobClient:566 - Counters: 3217:03:38,873 INFO JobClient:568 - File System Counters17:03:38,876 INFO JobClient:570 - FILE: Number of bytes read=225315717:03:38,876 INFO JobClient:570 - FILE: Number of bytes written=493611617:03:38,877 INFO JobClient:570 - FILE: Number of read operations=017:03:38,877 INFO JobClient:570 - FILE: Number of large read operations=017:03:38,877 INFO JobClient:570 - FILE: Number of write operations=017:03:38,877 INFO JobClient:570 - HDFS: Number of bytes read=11617:03:38,877 INFO JobClient:570 - HDFS: Number of bytes written=017:03:38,878 INFO JobClient:570 - HDFS: Number of read operations=117:03:38,878 INFO JobClient:570 - HDFS: Number of large read operations=017:03:38,878 INFO JobClient:570 - HDFS: Number of write operations=017:03:38,881 INFO JobClient:568 - Job Counters 17:03:38,882 INFO JobClient:570 - Launched map tasks=117:03:38,882 INFO JobClient:570 - Launched reduce tasks=117:03:38,882 INFO JobClient:570 - Data-local map tasks=117:03:38,882 INFO JobClient:570 - Total time spent by all maps in occupied slots (ms)=206626217:03:38,882 INFO JobClient:570 - Total time spent by all reduces in occupied slots (ms)=129324317:03:38,883 INFO JobClient:570 - Total time spent by all maps waiting after reserving slots (ms)=017:03:38,883 INFO JobClient:570 - Total time spent by all reduces waiting after reserving slots (ms)=017:03:38,886 INFO JobClient:568 - Map-Reduce Framework17:03:38,886 INFO JobClient:570 - Map input records=8281817:03:38,886 INFO JobClient:570 - Map output records=8281817:03:38,886 INFO JobClient:570 - Map output bytes=850491517:03:38,886 INFO JobClient:570 - Input split bytes=11617:03:38,887 INFO JobClient:570 - Combine input records=017:03:38,887 INFO JobClient:570 - Combine output records=017:03:38,887 INFO JobClient:570 - Reduce input groups=8270617:03:38,887 INFO JobClient:570 - Reduce shuffle bytes=225315317:03:38,887 INFO JobClient:570 - Reduce input records=8281817:03:38,888 INFO JobClient:570 - Reduce output records=8270617:03:38,888 INFO JobClient:570 - Spilled Records=16563617:03:38,888 INFO JobClient:570 - CPU time spent (ms)=320136017:03:38,888 INFO JobClient:570 - Physical memory (bytes) snapshot=109038796817:03:38,888 INFO JobClient:570 - Virtual memory (bytes) snapshot=668360704017:03:38,889 INFO JobClient:570 - Total committed heap usage (bytes)=48732569617:03:38,890 INFO ActionDataInterpret:595 - Map Job is Completed This is a lot longer than what i expected.. 1 hour is just too slow.. Can i improve it? We have a 6 node cluster running on EC2 at the moment. Another Question, why does it indicate number of mappers as 1? Can i change it so that multiple mappers perform computation? 2. ) If my table is in the order of millions, the number of mappers is increased to 5.. How does Hadoop know how many mappers to run for a specific job? -- Regards- Pavan