Hi Seonyoung! Please take a look at this file : https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L208 .
This is an auxiliary service that runs inside the NodeManager which provides the intermediate data. Cheers Ravi On Tue, Jun 6, 2017 at 8:06 PM, Seonyoung Park <render...@gmail.com> wrote: > Hi all, > > We've run a hadoop cluster (Apache Hadoop 2.7.1) with 40 datanodes. > Currently, we're using Fair Scheduler in our cluster. > And there are no limits on the number of concurrent running jobs. > 30 ~ 50 I/O heavy jobs has been running concurrently at dawn. > > Recently we got shuffle errors as follows when we had run HDFS Balancer or > spark streaming jobs.. > > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: > error in shuffle in fetcher#2 > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run( > Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; > bailing-out. > at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl. > checkReducerHealth(ShuffleSchedulerImpl.java:366) > at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl. > copyFailed(ShuffleSchedulerImpl.java:288) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher. > copyFromHost(Fetcher.java:354) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run( > Fetcher.java:193) > > > > I also noticed that SocketTimeoutException had occurred in some tasks in > the same job. > But there is no network problem.. > > > Someone said that we need to increase the value of > "mapreduce.tasktracker.http.threads" property. > However, no codes use that property after the commit starting with hash > value 80a05764be5c4f517. > > > Here are my questions: > > 1. Is that property currently being used? > 2. If so, Is it really helpful to solve our problem? > 3. Do we need to fine tune the settings of NodeManagers and DataNodes? > 4. Is there any better solution? > > > Thanks, > Pak >