Re: question on map merge-process

Todd Lipcon Fri, 17 Jul 2009 10:51:32 -0700

On Fri, Jul 17, 2009 at 10:37 AM, bonito perdo
<bonito.pe...@googlemail.com>wrote:


> Thank you Todd.
> I think I undersand it.
> However I don't know how to interpret the following:
>
>  File Systems
>             HDFS bytes read          4,300,800
>             Local bytes read           6,951,138
>             Local bytes written       11,349,729
>
>  Map-Reduce Framework
>     Combine output records    0
>     Map input records             50,069
>     Map output bytes               4,298,419
>     Map input bytes                 4,298,419
>     Combine input records      0
>     Map output records           50,069
>
> I would like to reassure that I have conceived this right.
> The size of the local bytes read is greater than the HDFS size since it
> includes the additional readings during the merge phase. Am I right?
> For example, for the current job where there is 20 spill files;
> a) 2 spills are meged.---> one new file - file_a
> b) 10 spills are merged ---> new file- file_b
> c) file_a + file_b + 8spills are merged --> final output file
>
> During c, the reading cost equals to the whole input split.
> However, there are additional reading and writing costs during a) and b).
> That is, there is the cost of writing (phase a) and reading (phase c) the
> file_a and writing (phase b) and reading (phase c) the file_b.
> I suppose that due to these additional readings the total local bytes read
> are greater than the HDFS.
>

Yep, that sounds exactly right. Roughly looking at your numbers, you have a
total of 4197K of map output. In 20 spills, that means each spill is around
209K. The total read is 2*209 + 10*209 + (2*209 + 10*209 + 8*209) = 6688K.
This is just a little bit under the amount your counters show, probably due
to overhead from reading some checksums, rounding in my calculations, etc.

-Todd


> On Fri, Jul 17, 2009 at 12:54 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> > Hi,
> >
> > Your understanding of the merge sort process seems correct, but I'm not
> > quite sure what your question is.
> >
> > The merge process here is on the output side of the map task, so input
> > splits don't factor in. Here's an overview of the process:
> >
> > When your mapper writes output records, it goes into RAM
> > When you fill up some percentage of the buffer in RAM, it sorts and
> spills
> > to disk  ("Spilling map output" in the log)
> >
> > After the mapper has completely finished, it has some number of files on
> > disk (in your case 20). It then tries to perform the minimum amount of
> work
> > in order to end up with a single sorted output file in the end. To do
> this,
> > it goes through some number of passes of merges of the spill files. Each
> > merge phase can, at maximum, merge io.sort.factor (default 10) files.
> This
> > is to reduce random IO - reading a lot of files concurrently can cause
> the
> > disk head to bounce around and reduce throughput.
> >
> > Given that you have 20 map spills, it decided to do one pass of 2->1,
> > resulting in 19, then one pass of 10->1, resulting in 10, and then the
> > final
> > merge. If you ignore combiners, the IO cost for a single merge pass is
> > equal
> > reading and writing of the total amount of data in that pass. If each of
> > your segments is 1M, then you've got one pass which reads/writes 2MB, one
> > pass that reads/writes 10MB, then one pass that reads and writes
> > 2+10+8=20MB. Total being 32MB.
> >
> > Note that, even though your io.sort.mb is 1MB, the map spills actually
> > happen before the buffer is full. This is so that the mapper can continue
> > to
> > run while the spill happens in a background thread. io.sort.*.percent are
> > the relevant variables for tuning this behavior.
> >
> > Hope that helps explain things.
> >
> > -Todd
> >
> > On Thu, Jul 16, 2009 at 3:18 PM, bonito <bonito.pe...@gmail.com> wrote:
> >
> > >
> > > Hi all!
> > > I have a question regarding the merge process during a map task.
> > > I have the following results:
> > >
> > > 2009-07-16 22:05:39,977 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > > Initializing JVM Metrics with processName=MAP, sessionId=
> > > 2009-07-16 22:05:40,028 INFO org.apache.hadoop.mapred.MapTask:
> > > numReduceTasks: 1
> > > 2009-07-16 22:05:40,037 INFO org.apache.hadoop.mapred.MapTask:
> io.sort.mb
> > =
> > > 1
> > > 2009-07-16 22:05:40,038 INFO org.apache.hadoop.mapred.MapTask: data
> > buffer
> > > =
> > > 796928/996160
> > > 2009-07-16 22:05:40,038 INFO org.apache.hadoop.mapred.MapTask: record
> > > buffer
> > > = 2620/3276
> > > 2009-07-16 22:05:40,110 INFO org.apache.hadoop.mapred.MapTask: Spilling
> > map
> > > output: record full = true
> > > 2009-07-16 22:05:40,110 INFO org.apache.hadoop.mapred.MapTask: bufstart
> =
> > > 0;
> > > bufend = 221562; bufvoid = 996160
> > > 2009-07-16 22:05:40,110 INFO org.apache.hadoop.mapred.MapTask: kvstart
> =
> > 0;
> > > kvend = 2620; length = 3276
> > > 2009-07-16 22:05:40,164 INFO org.apache.hadoop.mapred.MapTask: Finished
> > > spill 0
> > > 2009-07-16 22:05:40,174 INFO org.apache.hadoop.mapred.MapTask: Spilling
> > map
> > > output: record full = true
> > > 2009-07-16 22:05:40,174 INFO org.apache.hadoop.mapred.MapTask: bufstart
> =
> > > 221562; bufend = 444315; bufvoid = 996160
> > > 2009-07-16 22:05:40,174 INFO org.apache.hadoop.mapred.MapTask: kvstart
> =
> > > 2620; kvend = 1963; length = 3276
> > > 2009-07-16 22:05:40,185 INFO org.apache.hadoop.mapred.MapTask: Finished
> > > spill 1
> > > 2009-07-16 22:05:40,194 INFO org.apache.hadoop.mapred.MapTask: Spilling
> > map
> > > output: record full = true
> > > 2009-07-16 22:05:40,194 INFO org.apache.hadoop.mapred.MapTask: bufstart
> =
> > > 444315; bufend = 667162; bufvoid = 996160
> > > 2009-07-16 22:05:40,194 INFO org.apache.hadoop.mapred.MapTask: kvstart
> =
> > > 1963; kvend = 1306; length = 3276
> > > 2009-07-16 22:05:40,205 INFO org.apache.hadoop.mapred.MapTask: Finished
> > > spill 2
> > > 2009-07-16 22:05:40,218 INFO org.apache.hadoop.mapred.MapTask: Spilling
> > map
> > > output: record full = true
> > > 2009-07-16 22:05:40,218 INFO org.apache.hadoop.mapred.MapTask: bufstart
> =
> > > 667162; bufend = 890254; bufvoid = 996160
> > > 2009-07-16 22:05:40,218 INFO org.apache.hadoop.mapred.MapTask: kvstart
> =
> > > 1306; kvend = 649; length = 3276
> > > 2009-07-16 22:05:40,229 INFO org.apache.hadoop.mapred.MapTask: Finished
> > > spill 3
> > > ..
> > > .....
> > > ......
> > > 2009-07-16 22:05:40,789 INFO org.apache.hadoop.mapred.MapTask: Finished
> > > spill 19
> > > 2009-07-16 22:05:40,823 INFO org.apache.hadoop.mapred.Merger: Merging
> 20
> > > sorted segments
> > > 2009-07-16 22:05:40,829 INFO org.apache.hadoop.mapred.Merger: Merging 2
> > > intermediate segments out of a total of 20
> > > 2009-07-16 22:05:40,918 INFO org.apache.hadoop.mapred.Merger: Merging
> 10
> > > intermediate segments out of a total of 19
> > > 2009-07-16 22:05:41,145 INFO org.apache.hadoop.mapred.Merger: Down to
> the
> > > last merge-pass, with 10 segments left of total size: 4398577 bytes
> > > 2009-07-16 22:05:41,353 INFO org.apache.hadoop.mapred.MapTask: Index:
> (0,
> > > 4398559, 4398563)
> > > 2009-07-16 22:05:41,420 INFO org.apache.hadoop.mapred.TaskRunner:
> > > Task:attempt_200907161728_0014_m_000000_0 is done. And is in the
> process
> > of
> > > commiting
> > > 2009-07-16 22:05:41,781 INFO org.apache.hadoop.mapred.TaskRunner: Task
> > > 'attempt_200907161728_0014_m_000000_0' done.
> > >
> > > If I understood well, the map task applies an external-merge sort.
> Right?
> > > I want to calculate the I/O cost of the merge procedure.
> > > Having read the relative code I am a bit confused though.
> > > In the above results, there are 20 spill segments of approximately 1mb
> > each
> > > (since I have determined io.sort.mb=1.
> > > These 20 spills are merged as follows:
> > > -> first 2 spills are merged resulting in a newly created segment/file
> of
> > > ~2mb. (this created file is written to the local disk)
> > > --> now there are 18 (initial) segments and the newly created...that is
> > 19
> > > segments in total. 10 segments are merged out of 19. Result: one new
> > > segment
> > > (from the 10 merged) and another 9 segments -->total: 10 segments/files
> > > --->finally the remaining 10 segments are merged yielding the final
> > output
> > > file.
> > >
> > > In my attempt to evaluate the I/O cost I got rather confused.
> > > If I am not mistaken, in *each* of the above phases *NOT* the whole
> > > input-split is processed. If that was the case then in each one of the
> 3
> > > passed there would be one read and write of the input split.
> > >
> > > could you please give me some details about it? Is there any way I
> could
> > > determine/approach the I/O cost?
> > > for example I could assume that each segment equals to a page of the
> > input.
> > > so I would have 20 pages.
> > > taking into account the io.file.buffer.size would be of any help to my
> > > attempt?
> > > any help would be appreciated.
> > > Thank you.
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://www.nabble.com/question-on-map-merge-process-tp24525493p24525493.html
> > > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > >
> > >
> >
>

Re: question on map merge-process

Reply via email to