The num of map is decided by the block size and your rawdata — Sent from Mailbox for iPhone
On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yypvsxf19870...@gmail.com> wrote: > Hi All > I take NLineInputFormat as the Text Input Format with the following code > : > NLineInputFormat.setNumLinesPerSplit(job, 10); > NLineInputFormat.addInputPath(job,new Path(args[0].toString())); > My input file contains 1000 rows,so I thought it will distribute > 100(1000/10) maps.However I got 4 maps. > I'm confued by the number of Map that was distributed according to the > running log[1]. > How it distribute maps when using NLineInputFormat > Regards > [1]======================================================= > .... > .... > 2013-04-19 23:56:20,377 INFO mapreduce.Job > (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber > mode : false > 2013-04-19 23:56:20,377 INFO mapreduce.Job > (Job.java:monitorAndPrintJob(1293)) - map 25% reduce 0% > 2013-04-19 23:56:20,381 INFO mapred.MapTask > (MapTask.java:sortAndSpill(1597)) - Finished spill 0 > 2013-04-19 23:56:20,384 INFO mapred.Task (Task.java:done(979)) - > Task:attempt_local_0001_m_000001_0 is done. And is in the process of > committing > 2013-04-19 23:56:20,388 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - map > 2013-04-19 23:56:20,389 INFO mapred.Task (Task.java:sendDone(1099)) - Task > 'attempt_local_0001_m_000001_0' done. > 2013-04-19 23:56:20,389 INFO mapred.LocalJobRunner > (LocalJobRunner.java:run(238)) - Finishing task: > attempt_local_0001_m_000001_0 > 2013-04-19 23:56:20,389 INFO mapred.LocalJobRunner > (LocalJobRunner.java:run(213)) - Starting task: > attempt_local_0001_m_000002_0 > 2013-04-19 23:56:20,391 INFO mapred.Task (Task.java:initialize(565)) - > Using ResourceCalculatorPlugin : > org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916 > 2013-04-19 23:56:20,486 INFO mapred.MapTask > (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584) > 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:<init>(923)) - > mapreduce.task.io.sort.mb: 100 > 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:<init>(924)) - > soft limit at 83886080 > 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:<init>(925)) - > bufstart = 0; bufvoid = 104857600 > 2013-04-19 23:56:20,487 INFO mapred.MapTask (MapTask.java:<init>(926)) - > kvstart = 26214396; length = 6553600 > 2013-04-19 23:56:20,515 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - > 2013-04-19 23:56:20,515 INFO mapred.MapTask (MapTask.java:flush(1389)) - > Starting flush of map output > 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1408)) - > Spilling map output > 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1409)) - > bufstart = 0; bufend = 336; bufvoid = 104857600 > 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1411)) - > kvstart = 26214396(104857584); kvend = 26214208(104856832); length = > 189/6553600 > 2013-04-19 23:56:20,523 INFO mapred.MapTask > (MapTask.java:sortAndSpill(1597)) - Finished spill 0 > 2013-04-19 23:56:20,552 INFO mapred.Task (Task.java:done(979)) - > Task:attempt_local_0001_m_000002_0 is done. And is in the process of > committing > 2013-04-19 23:56:20,555 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - map > 2013-04-19 23:56:20,556 INFO mapred.Task (Task.java:sendDone(1099)) - Task > 'attempt_local_0001_m_000002_0' done. > 2013-04-19 23:56:20,556 INFO mapred.LocalJobRunner > (LocalJobRunner.java:run(238)) - Finishing task: > attempt_local_0001_m_000002_0 > 2013-04-19 23:56:20,556 INFO mapred.LocalJobRunner > (LocalJobRunner.java:run(213)) - Starting task: > attempt_local_0001_m_000003_0 > 2013-04-19 23:56:20,558 INFO mapred.Task (Task.java:initialize(565)) - > Using ResourceCalculatorPlugin : > org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3 > 2013-04-19 23:56:20,666 INFO mapred.MapTask > (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584) > 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:<init>(923)) - > mapreduce.task.io.sort.mb: 100 > 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:<init>(924)) - > soft limit at 83886080 > 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:<init>(925)) - > bufstart = 0; bufvoid = 104857600 > 2013-04-19 23:56:20,667 INFO mapred.MapTask (MapTask.java:<init>(926)) - > kvstart = 26214396; length = 6553600 > 2013-04-19 23:56:20,690 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - > 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1389)) - > Starting flush of map output > 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1408)) - > Spilling map output > 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1409)) - > bufstart = 0; bufend = 329; bufvoid = 104857600 > 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1411)) - > kvstart = 26214396(104857584); kvend = 26214212(104856848); length = > 185/6553600 > 2013-04-19 23:56:20,695 INFO mapred.MapTask > (MapTask.java:sortAndSpill(1597)) - Finished spill 0 > 2013-04-19 23:56:20,697 INFO mapred.Task (Task.java:done(979)) - > Task:attempt_local_0001_m_000003_0 is done. And is in the process of > committing > 2013-04-19 23:56:20,717 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - map > 2013-04-19 23:56:20,718 INFO mapred.Task (Task.java:sendDone(1099)) - Task > 'attempt_local_0001_m_000003_0' done. > 2013-04-19 23:56:20,718 INFO mapred.LocalJobRunner > (LocalJobRunner.java:run(238)) - Finishing task: > attempt_local_0001_m_000003_0 > 2013-04-19 23:56:20,718 INFO mapred.LocalJobRunner > (LocalJobRunner.java:run(394)) - Map task executor complete. > 2013-04-19 23:56:20,752 INFO mapred.Task (Task.java:initialize(565)) - > Using ResourceCalculatorPlugin : > org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d > 2013-04-19 23:56:20,760 INFO mapred.Merger (Merger.java:merge(549)) - > Merging 4 sorted segments > 2013-04-19 23:56:20,767 INFO mapred.Merger (Merger.java:merge(648)) - Down > to the last merge-pass, with 4 segments left of total size: 8532 bytes > 2013-04-19 23:56:20,768 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - > 2013-04-19 23:56:20,807 WARN conf.Configuration > (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is > deprecated. Instead, use mapreduce.job.skiprecords > 2013-04-19 23:56:21,129 INFO mapred.Task (Task.java:done(979)) - > Task:attempt_local_0001_r_000000_0 is done. And is in the process of > committing > 2013-04-19 23:56:21,131 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - > 2013-04-19 23:56:21,131 INFO mapred.Task (Task.java:commit(1140)) - Task > attempt_local_0001_r_000000_0 is allowed to commit now > 2013-04-19 23:56:21,138 INFO output.FileOutputCommitter > (FileOutputCommitter.java:commitTask(432)) - Saved output of task > 'attempt_local_0001_r_000000_0' to > hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000 > 2013-04-19 23:56:21,139 INFO mapred.LocalJobRunner > (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce > 2013-04-19 23:56:21,139 INFO mapred.Task (Task.java:sendDone(1099)) - Task > 'attempt_local_0001_r_000000_0' done. > 2013-04-19 23:56:21,381 INFO mapreduce.Job > (Job.java:monitorAndPrintJob(1293)) - map 100% reduce 100% > 2013-04-19 23:56:21,381 INFO mapreduce.Job > (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed > successfully > 2013-04-19 23:56:21,427 INFO mapreduce.Job > (Job.java:monitorAndPrintJob(1311)) - Counters: 32 > File System Counters > FILE: Number of bytes read=483553 > FILE: Number of bytes written=1313962 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=296769 > HDFS: Number of bytes written=284 > HDFS: Number of read operations=66 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=8 > Map-Reduce Framework > Map input records=1000 > Map output records=1000 > Map output bytes=6543 > Map output materialized bytes=8567 > Input split bytes=516 > Combine input records=0 > Combine output records=0 > Reduce input groups=12 > Reduce shuffle bytes=0 > Reduce input records=1000 > Reduce output records=0 > Spilled Records=2000 > Shuffled Maps =0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=7 > CPU time spent (ms)=0 > Physical memory (bytes) snapshot=0 > Virtual memory (bytes) snapshot=0 > Total committed heap usage (bytes)=1773993984 > File Input Format Counters > Bytes Read=68723 > File Output Format Counters > Bytes Written=0