Thank you Jonathan. I will get some details and get back if needed. Thanks, Silas ________________________________ From: Jonathan Eagles <[email protected]> Sent: Thursday, September 24, 2020 4:02 AM To: user <[email protected]>; Silas Ge (Shuangyin Ge) <[email protected]> Subject: Re: A mapper task in Tez is getting a lot more data than other mappers
Without much to go on, I assume this is a hive query. I am not a hive expert, but I can try to give some help. The hive mailing list will surely give a better answer. What is the input format? What is the hive strategy used? If orc is used, hive can be used to give statistics about the stripes involved hive --orcfiledump <path to file> set hive.exec.orc.split.strategy=BI; Will only consider split calculation based on files. If there is skew, then set hive.exec.orc.split.strategy=ETL; Should be used instead. Then stripes will be combined until between min and max size is reached. In that case, between mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize. There is also a way in tez that allows the grouping of splits with tez.grouping.min-size set hive.exec.orc.split.strategy=ETL; -- default usually HYBRID or BI set mapreduce.input.fileinputformat.split.maxsize=16,777,216; -- 16MB Let me know what you think or to correct any of my assumptions. Jon On Wed, Sep 23, 2020 at 6:24 AM Silas Ge (Shuangyin Ge) <[email protected]<mailto:[email protected]>> wrote: Hi all, I have a question here - below is a task I ran in Tez, and the result shows one Map - Map 2 is getting a lot more data than other mappers so the task takes very long to complete. Is there anything I can tune on such tasks? : ---------------------------------------------------------------------------------------------- : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS : ---------------------------------------------------------------------------------------------- : Map 1 5314.00 5,250 70 7,671 419 : Map 10 5313.00 4,570 71 20 20 : Map 11 4248.00 4,110 85 43 43 : Map 12 4248.00 4,030 89 10 10 : Map 13 4248.00 4,460 56 31 31 : Map 14 2684.00 3,500 81 39 39 : Map 15 4248.00 4,190 56 8 8 : Map 16 3192.00 3,880 64 27 27 : Map 17 3708.00 4,390 66 15 15 : Map 18 4789.00 3,990 62 138 138 : Map 19 4248.00 4,460 94 33 33 : Map 2 16838453.00 467,381,280 5,706,963 2,048,182,505 313 : Map 20 2684.00 3,850 48 14 14 : Map 21 5313.00 4,520 55 21 21 : Map 22 4248.00 4,300 39 84 84 : Map 23 4773.00 4,360 48 94 94 : Map 24 4248.00 4,570 50 294 294 : Map 25 2684.00 3,860 96 5 5 : Map 26 3718.00 3,990 58 16 16 : Map 27 4249.00 4,490 92 266 266 : Map 4 4248.00 4,560 73 45 45 : Map 5 2684.00 4,000 83 26 26 : Map 6 4248.00 4,400 115 14 14 : Map 7 4248.00 4,320 34 13 13 : Map 8 4248.00 4,220 60 1 1 : Map 9 4248.00 4,340 64 48 48 : Reducer 3 951723.00 28,750 16 313 0
