I am getting an issue on a production cluster where Mapreduce Job taking
long time in reduce phase while running a Hive query, approximately 2 hrs
for one reduce task.
Hardware specification of the cluster :
Number of nodes:22
Number of Master nodes4
Number of data nodes:18
Core per nodes:24
Memory per nodes:512 GB
Minimum yarn container size is 2048MB.
This issue does not occur all the time. So I am not sure what might be the
root cause.
Please help me with your suggestions if any one experiencing similar issue
Application Log:
8:02,616 INFO [main]
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager:
memoryLimit=6370046464, maxSingleShuffleLimit=1592511616,
mergeThreshold=4204230912, ioSortFactor=100,
memToMemMergeOutputsThreshold=100
2017-06-04 18:58:02,619 INFO [EventFetcher for fetching Map Completion
Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1496358601894_1431_r_00_0 Thread started: EventFetcher for
fetching Map Completion Events
2017-06-04 18:58:02,628 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning
am1dlccmrhdd08.r1-core.r1.aig.net:13562 with 1 to fetcher#5
2017-06-04 18:58:02,628 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of
1 to am1dlccmrhdd08.r1-core.r1.aig.net:13562 to fetcher#5
2017-06-04 18:58:02,628 INFO [EventFetcher for fetching Map Completion
Events] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl:
Ignoring obsolete output of OBSOLETE map-task:
'attempt_1496358601894_1431_m_03_0'
2017-06-04 18:58:02,629 INFO [EventFetcher for fetching Map Completion
Events] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl:
Ignoring obsolete output of FAILED map-task:
'attempt_1496358601894_1431_m_03_0'
2017-06-04 18:58:02,629 INFO [fetcher#8]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning
am1dlccmrhdd01.r1-core.r1.aig.net:13562 with 1 to fetcher#8
2017-06-04 18:58:02,631 INFO [fetcher#8]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of
1 to am1dlccmrhdd01.r1-core.r1.aig.net:13562 to fetcher#8
2017-06-04 18:58:02,631 INFO [EventFetcher for fetching Map Completion
Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1496358601894_1431_r_00_0: Got 3 new map-outputs
2017-06-04 18:58:02,747 INFO [fetcher#8]
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for
url=13562/mapOutput?job=job_1496358601894_1431=0=attempt_1496358601894_1431_m_06_0
sent hash and received reply
2017-06-04 18:58:02,747 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for
url=13562/mapOutput?job=job_1496358601894_1431=0=attempt_1496358601894_1431_m_09_0
sent hash and received reply
2017-06-04 18:58:02,752 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 about to shuffle
output of map attempt_1496358601894_1431_m_09_0 decomp: 40205 len:
40209 to MEMORY
2017-06-04 18:58:02,752 INFO [fetcher#8]
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#8 about to shuffle
output of map attempt_1496358601894_1431_m_06_0 decomp: 50946 len:
50950 to MEMORY
2017-06-04 18:58:02,755 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 40205 bytes
from map-output for attempt_1496358601894_1431_m_09_0
2017-06-04 18:58:02,755 INFO [fetcher#8]
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 50946 bytes
from map-output for attempt_1496358601894_1431_m_06_0
2017-06-04 18:58:02,758 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
-> map-output of size: 40205, inMemoryMapOutputs.size() -> 1, commitMemory
-> 0, usedMemory ->91151
2017-06-04 18:58:02,759 INFO [fetcher#8]
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
-> map-output of size: 50946, inMemoryMapOutputs.size() -> 2, commitMemory
-> 40205, usedMemory ->91151
2017-06-04 18:58:02,760 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl:
am1dlccmrhdd08.r1-core.r1.aig.net:13562 freed by fetcher#5 in 131ms
2017-06-04 18:58:02,760 INFO [fetcher#8]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl:
am1dlccmrhdd01.r1-core.r1.aig.net:13562 freed by fetcher#8 in 131ms
2017-06-04 18:58:02,760 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning
am1dlccmrhdd01.r1-core.r1.aig.net:13562 with 1 to fetcher#5
2017-06-04 18:58:02,760 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of
1 to am1dlccmrhdd01.r1-core.r1.aig.net:13562 to fetcher#5
2017-06-04 18:58:02,762 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for
url=13562/mapOutput?job=job_1496358601894_1431=0=attempt_1496358601894_1431_m_08_0
sent hash and received reply
2017-06-04 18:58:02,762 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 about to shuffle
output of map