Hi All I take NLineInputFormat as the Text Input Format with the following code : NLineInputFormat.setNumLinesPerSplit(job, 10); NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
My input file contains 1000 rows,so I thought it will distribute 100(1000/10) maps.However I got 4 maps. I'm confued by the number of Map that was distributed according to the running log[1]. How it distribute maps when using NLineInputFormat Regards [1]======================================================= .... .... 2013-04-19 23:56:20,377 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber mode : false 2013-04-19 23:56:20,377 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1293)) - map 25% reduce 0% 2013-04-19 23:56:20,381 INFO mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0 2013-04-19 23:56:20,384 INFO mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000001_0 is done. And is in the process of committing 2013-04-19 23:56:20,388 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map 2013-04-19 23:56:20,389 INFO mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000001_0' done. 2013-04-19 23:56:20,389 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000001_0 2013-04-19 23:56:20,389 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000002_0 2013-04-19 23:56:20,391 INFO mapred.Task (Task.java:initialize(565)) - Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584) 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600 2013-04-19 23:56:20,487 INFO mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600 2013-04-19 23:56:20,515 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 2013-04-19 23:56:20,515 INFO mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 336; bufvoid = 104857600 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214208(104856832); length = 189/6553600 2013-04-19 23:56:20,523 INFO mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0 2013-04-19 23:56:20,552 INFO mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000002_0 is done. And is in the process of committing 2013-04-19 23:56:20,555 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map 2013-04-19 23:56:20,556 INFO mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000002_0' done. 2013-04-19 23:56:20,556 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000002_0 2013-04-19 23:56:20,556 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000003_0 2013-04-19 23:56:20,558 INFO mapred.Task (Task.java:initialize(565)) - Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584) 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600 2013-04-19 23:56:20,667 INFO mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600 2013-04-19 23:56:20,690 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 329; bufvoid = 104857600 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600 2013-04-19 23:56:20,695 INFO mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0 2013-04-19 23:56:20,697 INFO mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000003_0 is done. And is in the process of committing 2013-04-19 23:56:20,717 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map 2013-04-19 23:56:20,718 INFO mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000003_0' done. 2013-04-19 23:56:20,718 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000003_0 2013-04-19 23:56:20,718 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(394)) - Map task executor complete. 2013-04-19 23:56:20,752 INFO mapred.Task (Task.java:initialize(565)) - Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d 2013-04-19 23:56:20,760 INFO mapred.Merger (Merger.java:merge(549)) - Merging 4 sorted segments 2013-04-19 23:56:20,767 INFO mapred.Merger (Merger.java:merge(648)) - Down to the last merge-pass, with 4 segments left of total size: 8532 bytes 2013-04-19 23:56:20,768 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 2013-04-19 23:56:20,807 WARN conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2013-04-19 23:56:21,129 INFO mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing 2013-04-19 23:56:21,131 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 2013-04-19 23:56:21,131 INFO mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0 is allowed to commit now 2013-04-19 23:56:21,138 INFO output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000 2013-04-19 23:56:21,139 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce 2013-04-19 23:56:21,139 INFO mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0' done. 2013-04-19 23:56:21,381 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1293)) - map 100% reduce 100% 2013-04-19 23:56:21,381 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed successfully 2013-04-19 23:56:21,427 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1311)) - Counters: 32 File System Counters FILE: Number of bytes read=483553 FILE: Number of bytes written=1313962 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=296769 HDFS: Number of bytes written=284 HDFS: Number of read operations=66 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Map-Reduce Framework Map input records=1000 Map output records=1000 Map output bytes=6543 Map output materialized bytes=8567 Input split bytes=516 Combine input records=0 Combine output records=0 Reduce input groups=12 Reduce shuffle bytes=0 Reduce input records=1000 Reduce output records=0 Spilled Records=2000 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=7 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=1773993984 File Input Format Counters Bytes Read=68723 File Output Format Counters Bytes Written=0