Hi all, I have a question here - below is a task I ran in Tez, and the result shows one Map - Map 2 is getting a lot more data than other mappers so the task takes very long to complete. Is there anything I can tune on such tasks?
:
----------------------------------------------------------------------------------------------
: VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS
OUTPUT_RECORDS
:
----------------------------------------------------------------------------------------------
: Map 1 5314.00 5,250 70 7,671
419
: Map 10 5313.00 4,570 71 20
20
: Map 11 4248.00 4,110 85 43
43
: Map 12 4248.00 4,030 89 10
10
: Map 13 4248.00 4,460 56 31
31
: Map 14 2684.00 3,500 81 39
39
: Map 15 4248.00 4,190 56 8
8
: Map 16 3192.00 3,880 64 27
27
: Map 17 3708.00 4,390 66 15
15
: Map 18 4789.00 3,990 62 138
138
: Map 19 4248.00 4,460 94 33
33
: Map 2 16838453.00 467,381,280 5,706,963 2,048,182,505
313
: Map 20 2684.00 3,850 48 14
14
: Map 21 5313.00 4,520 55 21
21
: Map 22 4248.00 4,300 39 84
84
: Map 23 4773.00 4,360 48 94
94
: Map 24 4248.00 4,570 50 294
294
: Map 25 2684.00 3,860 96 5
5
: Map 26 3718.00 3,990 58 16
16
: Map 27 4249.00 4,490 92 266
266
: Map 4 4248.00 4,560 73 45
45
: Map 5 2684.00 4,000 83 26
26
: Map 6 4248.00 4,400 115 14
14
: Map 7 4248.00 4,320 34 13
13
: Map 8 4248.00 4,220 60 1
1
: Map 9 4248.00 4,340 64 48
48
: Reducer 3 951723.00 28,750 16 313
0
