On Fri, Jul 17, 2009 at 10:37 AM, bonito perdo <bonito.pe...@googlemail.com>wrote:
> Thank you Todd. > I think I undersand it. > However I don't know how to interpret the following: > > File Systems > HDFS bytes read 4,300,800 > Local bytes read 6,951,138 > Local bytes written 11,349,729 > > Map-Reduce Framework > Combine output records 0 > Map input records 50,069 > Map output bytes 4,298,419 > Map input bytes 4,298,419 > Combine input records 0 > Map output records 50,069 > > I would like to reassure that I have conceived this right. > The size of the local bytes read is greater than the HDFS size since it > includes the additional readings during the merge phase. Am I right? > For example, for the current job where there is 20 spill files; > a) 2 spills are meged.---> one new file - file_a > b) 10 spills are merged ---> new file- file_b > c) file_a + file_b + 8spills are merged --> final output file > > During c, the reading cost equals to the whole input split. > However, there are additional reading and writing costs during a) and b). > That is, there is the cost of writing (phase a) and reading (phase c) the > file_a and writing (phase b) and reading (phase c) the file_b. > I suppose that due to these additional readings the total local bytes read > are greater than the HDFS. > Yep, that sounds exactly right. Roughly looking at your numbers, you have a total of 4197K of map output. In 20 spills, that means each spill is around 209K. The total read is 2*209 + 10*209 + (2*209 + 10*209 + 8*209) = 6688K. This is just a little bit under the amount your counters show, probably due to overhead from reading some checksums, rounding in my calculations, etc. -Todd > On Fri, Jul 17, 2009 at 12:54 AM, Todd Lipcon <t...@cloudera.com> wrote: > > > Hi, > > > > Your understanding of the merge sort process seems correct, but I'm not > > quite sure what your question is. > > > > The merge process here is on the output side of the map task, so input > > splits don't factor in. Here's an overview of the process: > > > > When your mapper writes output records, it goes into RAM > > When you fill up some percentage of the buffer in RAM, it sorts and > spills > > to disk ("Spilling map output" in the log) > > > > After the mapper has completely finished, it has some number of files on > > disk (in your case 20). It then tries to perform the minimum amount of > work > > in order to end up with a single sorted output file in the end. To do > this, > > it goes through some number of passes of merges of the spill files. Each > > merge phase can, at maximum, merge io.sort.factor (default 10) files. > This > > is to reduce random IO - reading a lot of files concurrently can cause > the > > disk head to bounce around and reduce throughput. > > > > Given that you have 20 map spills, it decided to do one pass of 2->1, > > resulting in 19, then one pass of 10->1, resulting in 10, and then the > > final > > merge. If you ignore combiners, the IO cost for a single merge pass is > > equal > > reading and writing of the total amount of data in that pass. If each of > > your segments is 1M, then you've got one pass which reads/writes 2MB, one > > pass that reads/writes 10MB, then one pass that reads and writes > > 2+10+8=20MB. Total being 32MB. > > > > Note that, even though your io.sort.mb is 1MB, the map spills actually > > happen before the buffer is full. This is so that the mapper can continue > > to > > run while the spill happens in a background thread. io.sort.*.percent are > > the relevant variables for tuning this behavior. > > > > Hope that helps explain things. > > > > -Todd > > > > On Thu, Jul 16, 2009 at 3:18 PM, bonito <bonito.pe...@gmail.com> wrote: > > > > > > > > Hi all! > > > I have a question regarding the merge process during a map task. > > > I have the following results: > > > > > > 2009-07-16 22:05:39,977 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > > > Initializing JVM Metrics with processName=MAP, sessionId= > > > 2009-07-16 22:05:40,028 INFO org.apache.hadoop.mapred.MapTask: > > > numReduceTasks: 1 > > > 2009-07-16 22:05:40,037 INFO org.apache.hadoop.mapred.MapTask: > io.sort.mb > > = > > > 1 > > > 2009-07-16 22:05:40,038 INFO org.apache.hadoop.mapred.MapTask: data > > buffer > > > = > > > 796928/996160 > > > 2009-07-16 22:05:40,038 INFO org.apache.hadoop.mapred.MapTask: record > > > buffer > > > = 2620/3276 > > > 2009-07-16 22:05:40,110 INFO org.apache.hadoop.mapred.MapTask: Spilling > > map > > > output: record full = true > > > 2009-07-16 22:05:40,110 INFO org.apache.hadoop.mapred.MapTask: bufstart > = > > > 0; > > > bufend = 221562; bufvoid = 996160 > > > 2009-07-16 22:05:40,110 INFO org.apache.hadoop.mapred.MapTask: kvstart > = > > 0; > > > kvend = 2620; length = 3276 > > > 2009-07-16 22:05:40,164 INFO org.apache.hadoop.mapred.MapTask: Finished > > > spill 0 > > > 2009-07-16 22:05:40,174 INFO org.apache.hadoop.mapred.MapTask: Spilling > > map > > > output: record full = true > > > 2009-07-16 22:05:40,174 INFO org.apache.hadoop.mapred.MapTask: bufstart > = > > > 221562; bufend = 444315; bufvoid = 996160 > > > 2009-07-16 22:05:40,174 INFO org.apache.hadoop.mapred.MapTask: kvstart > = > > > 2620; kvend = 1963; length = 3276 > > > 2009-07-16 22:05:40,185 INFO org.apache.hadoop.mapred.MapTask: Finished > > > spill 1 > > > 2009-07-16 22:05:40,194 INFO org.apache.hadoop.mapred.MapTask: Spilling > > map > > > output: record full = true > > > 2009-07-16 22:05:40,194 INFO org.apache.hadoop.mapred.MapTask: bufstart > = > > > 444315; bufend = 667162; bufvoid = 996160 > > > 2009-07-16 22:05:40,194 INFO org.apache.hadoop.mapred.MapTask: kvstart > = > > > 1963; kvend = 1306; length = 3276 > > > 2009-07-16 22:05:40,205 INFO org.apache.hadoop.mapred.MapTask: Finished > > > spill 2 > > > 2009-07-16 22:05:40,218 INFO org.apache.hadoop.mapred.MapTask: Spilling > > map > > > output: record full = true > > > 2009-07-16 22:05:40,218 INFO org.apache.hadoop.mapred.MapTask: bufstart > = > > > 667162; bufend = 890254; bufvoid = 996160 > > > 2009-07-16 22:05:40,218 INFO org.apache.hadoop.mapred.MapTask: kvstart > = > > > 1306; kvend = 649; length = 3276 > > > 2009-07-16 22:05:40,229 INFO org.apache.hadoop.mapred.MapTask: Finished > > > spill 3 > > > .. > > > ..... > > > ...... > > > 2009-07-16 22:05:40,789 INFO org.apache.hadoop.mapred.MapTask: Finished > > > spill 19 > > > 2009-07-16 22:05:40,823 INFO org.apache.hadoop.mapred.Merger: Merging > 20 > > > sorted segments > > > 2009-07-16 22:05:40,829 INFO org.apache.hadoop.mapred.Merger: Merging 2 > > > intermediate segments out of a total of 20 > > > 2009-07-16 22:05:40,918 INFO org.apache.hadoop.mapred.Merger: Merging > 10 > > > intermediate segments out of a total of 19 > > > 2009-07-16 22:05:41,145 INFO org.apache.hadoop.mapred.Merger: Down to > the > > > last merge-pass, with 10 segments left of total size: 4398577 bytes > > > 2009-07-16 22:05:41,353 INFO org.apache.hadoop.mapred.MapTask: Index: > (0, > > > 4398559, 4398563) > > > 2009-07-16 22:05:41,420 INFO org.apache.hadoop.mapred.TaskRunner: > > > Task:attempt_200907161728_0014_m_000000_0 is done. And is in the > process > > of > > > commiting > > > 2009-07-16 22:05:41,781 INFO org.apache.hadoop.mapred.TaskRunner: Task > > > 'attempt_200907161728_0014_m_000000_0' done. > > > > > > If I understood well, the map task applies an external-merge sort. > Right? > > > I want to calculate the I/O cost of the merge procedure. > > > Having read the relative code I am a bit confused though. > > > In the above results, there are 20 spill segments of approximately 1mb > > each > > > (since I have determined io.sort.mb=1. > > > These 20 spills are merged as follows: > > > -> first 2 spills are merged resulting in a newly created segment/file > of > > > ~2mb. (this created file is written to the local disk) > > > --> now there are 18 (initial) segments and the newly created...that is > > 19 > > > segments in total. 10 segments are merged out of 19. Result: one new > > > segment > > > (from the 10 merged) and another 9 segments -->total: 10 segments/files > > > --->finally the remaining 10 segments are merged yielding the final > > output > > > file. > > > > > > In my attempt to evaluate the I/O cost I got rather confused. > > > If I am not mistaken, in *each* of the above phases *NOT* the whole > > > input-split is processed. If that was the case then in each one of the > 3 > > > passed there would be one read and write of the input split. > > > > > > could you please give me some details about it? Is there any way I > could > > > determine/approach the I/O cost? > > > for example I could assume that each segment equals to a page of the > > input. > > > so I would have 20 pages. > > > taking into account the io.file.buffer.size would be of any help to my > > > attempt? > > > any help would be appreciated. > > > Thank you. > > > > > > > > > -- > > > View this message in context: > > > > > > http://www.nabble.com/question-on-map-merge-process-tp24525493p24525493.html > > > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > > > > > > >