Re: Hadoop shuffling traffic

Abdul Navaz Wed, 08 Oct 2014 22:18:00 -0700

Hello,

Fiesr of all thank you very much for your help. :)


I still have some doubt with this .

Is the highlighted metric “ Reduce shuffle bytes=3059”

1. Is the total bytes after the reduced phase. ( That is the output file
which is written into HDFS)
Or

2.  Is this is the actual shuffled traffic which is exchanged between
mappers and reducers before performing reducing ?

Please clarify !

Thanks & Regards,

Abdul Navaz



From:  Pramod Biligiri <pramodbilig...@gmail.com>
Reply-To:  <user@hadoop.apache.org>
Date:  Thursday, October 2, 2014 at 12:44 AM
To:  "zookeeper-u...@hadoop.apache.org" <user@hadoop.apache.org>
Subject:  Re: Hadoop shuffling traffic

Hi Abdul,
That is the right metric. You can take a look at this report we made on this
earlier: 
http://www.slideshare.net/pramodbiligiri/shuffle-phase-as-the-bottleneck-in-
hadoop-terasort

Pramod

On Wed, Oct 1, 2014 at 6:06 PM, Abdul Navaz <navaz....@gmail.com> wrote:
> Hello,
> 
> This is the portion of the output which is displayed on the console when I run
> sample word count job.
> 
> map 0% reduce 0%
> 
> 14/10/01 18:37:52 INFO mapred.JobClient:  map 100% reduce 0%
> 
> 14/10/01 18:38:10 INFO mapred.JobClient:  map 100% reduce 100%
> 
> 14/10/01 18:38:12 INFO mapred.JobClient: Job complete: job_201409262002_0003
> 
> 14/10/01 18:38:12 INFO mapred.JobClient: Counters: 29
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:   Job Counters
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Launched reduce tasks=1
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=23511
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Launched map tasks=1
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Data-local map tasks=1
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14193
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:   File Output Format Counters
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Bytes Written=1106
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:   FileSystemCounters
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     FILE_BYTES_READ=3059
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     HDFS_BYTES_READ=1601
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=108400
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1106
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:   File Input Format Counters
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Bytes Read=1486
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:   Map-Reduce Framework
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Map output materialized
> bytes=3059
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Map input records=6
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Reduce shuffle bytes=3059
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Spilled Records=544
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Map output bytes=2509
> 
> 14/10/01 18:38:12 INFO mapred.JobClient:     Total committed heap usage
> 
> 
> 
> I am trying to find the shuffling traffic that is total traffic generated when
> mappers exchange their key values pair with the reducer. Is the highlighted
> portion gives the shuffling traffic ?
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
> Research Assistant
> University of Houston Main Campus, Houston TX
> Ph: 281-685-0388 <tel:281-685-0388>
> 
> 
> 
> 
> On 9/26/14, 12:00 AM, "karthikeyan S" <karthispe...@gmail.com> wrote:
> 
>> The reducer starts as soon as it has data available from any one of the
>> mappers.
>> The reducer keeps polling the AM and asks if any mapper has completed
>> processing. If so it fetches data from that mapper.
>> So it's not necessary for all the mappers of a task to complete for
>> the reducer to start processing.
>> 
>> When the reducers starts fetching the data from the mappers it prints
>> that info in its syslog, from what I have seen.
>> 
>> Thanks,
>> Karthik
>> 
>> On Thu, Sep 25, 2014 at 8:27 PM, Bing Jiang <jiangbinglo...@gmail.com> wrote:
>>>  see mapreduce.job.reduce.slowstart.completedmaps
>>>  It gives hint of  when reduce tasks could kick off.
>>> 
>>>  2014-09-26 8:36 GMT+08:00 Abdul Navaz <navaz....@gmail.com>:
>>>> 
>>>>  Hello,
>>>> 
>>>>  I am having a Hadoop cluster with 1 name node and 3 data nodes. I running
>>>>  sample word count job on 1GB of file which is distributed among the HDFS.
>>>> 
>>>>  When I run the map reduce job, before even completing the mapping 100 %
>>>>  reduce starts.  Say for eg map 40% reduce 10% etc.
>>>> 
>>>>  I would like to know when the shuffling traffic starts ?
>>>> 
>>>>  ->  Is there any way to find out when exactly shuffling started ?  Does it
>>>>  generate any syslog in the logs .
>>>>  -> How to find the total amount of shuffling traffic?
>>>> 
>>>> 
>>>> 
>>>>  Thanks & Regards,
>>>> 
>>>>  Abdul Navaz
>>>>  Research Assistant
>>>>  University of Houston Main Campus, Houston TX
>>>>  Ph: 281-685-0388 <tel:281-685-0388>
>>>> 
>>> 
>>> 
>>> 
>>>  --
>>>  Bing Jiang
>>>  Tel：(86)134-2619-1361
>>>  weibo: http://weibo.com/jiangbinglover
>>>  BLOG: www.binospace.com <http://www.binospace.com>
>>>  BLOG: http://blog.sina.com.cn/jiangbinglover
>>>  Focus on distributed computing, HDFS/HBase
>>

Re: Hadoop shuffling traffic

Reply via email to