Btw, Rajesh, I set tez.task.generate.counters.per.io=true in my cluster but did not find the task counter per edge. Could you please give some counter examples when this is enabled so I could verify?
Thanks! Xiaoyong From: Rajesh Balamohan [mailto:[email protected]] Sent: Friday, April 24, 2015 4:55 PM To: [email protected] Subject: Re: How to Tuning Tez Task Performance Listing some details at very high level, - Set "tez.task.generate.counters.per.io<http://tez.task.generate.counters.per.io>=true" to get more details on the task counters. Basically this starts printinng the counters per edge, which can be a lot more useful for debugging. - In case you want to avoid container launches etc when you analyze for first time, try hive.prewarm.enabled=true & hive.prewarm.numcontainers=<no of containers you want in your sesssion to be prewarmed> - Container reuse is enabled by default in tez. (tez.am.container.idle.release-timeout-min.millis, tez.am.container.idle.release-timeout-max.millis controls the amount of time a container is held by AM before releasing it) - Set tez.runtime.io.sort.mb appropriately to avoid spills (you can check task counters in the logs to find out the spills and adjust it accordingly) - Set tez.runtime.sort.threads=2 to enable PipelinedSorter which is a lot performant than DefaultSorter (this is the default in master branch. But if you are using earlier releases, you can turn it on by setting tez.runtime.sort.threads=2). - Set tez.runtime.compress=true and set tez.runtime.compress.codec (SnappyCodec is preferred, but it is upto you to choose) - Set tez.runtime.shuffle.keep-alive.enabled=true in case you have shuffle heavy workload. This reduces number of connections in shuffle. - Adjust memory allocated to different inputs/outputs based on tez.task.scale.memory.ratios (but this is more of expert level setting which you might want to touch after nailing down any memory pressure) - Adjusting shuffle buffers are also possible, but would advise only when you nail down an issue related to shuffle/merge codepath. - Set "tez.runtime.optimize.local.fetch=true" to bypass http fetches (when data is locally present) Feel free to refer to https://github.com/t3rmin4t0r/tez-autobuild/blob/master/tez-site.xml for any commonly used settings for benchmarks. On Fri, Apr 24, 2015 at 1:52 PM, [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> wrote: I want to Tuning Tez Task Performance. This Tez Task is created by Hive. How to Tuning Tez Task Performance? Analyze performance by Tez Task Counts of Tez Log ? Any Suggestion? ________________________________ [email protected]<mailto:[email protected]> -- ~Rajesh.B
