I see, I installed TezUI and saw these values☺
Btw, this setting (tez.task.generate.counters.per.io) only affects the task 
counter right? i.e. it does not affect the dag/vertex level counter (as 
indicated by the name itself…) right?

Xiaoyong

From: Rajesh Balamohan [mailto:[email protected]]
Sent: Tuesday, April 28, 2015 8:07 AM
To: [email protected]
Subject: Re: How to Tuning Tez Task Performance

Not sure if you have Tez-UI which should render this info automatically. 
Otherwise you can verify from the application logs.  Example is given below.

2015-04-27 04:15:12,834 INFO [Dispatcher thread: Central] 
history.HistoryEventHandler: 
[HISTORY][DAG:dag_1429683757595_0452_1][Event:DAG_FINISHED]: 
dagId=dag_1429683757595_0452_1, startTime=1430133293306, 
finishTime=1430133312773, timeTaken=19467, status=SUCCEEDED, diagnostics=, 
counters=Counters: 225, org.apache.tez.common.counters.DAGCounter, 
NUM_SUCCEEDED_TASKS=43, TOTAL_LAUNCHED_TASKS=43,
.....
.....
 TaskCounter_Map_4_OUTPUT_Map_1, ADDITIONAL_SPILLS_BYTES_READ=0, 
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, 
OUTPUT_BYTES=137200, OUTPUT_BYTES_PHYSICAL=119705, 
OUTPUT_BYTES_WITH_OVERHEAD=548794, OUTPUT_LARGE_RECORDS=0, 
OUTPUT_RECORDS=27440, SPILLED_RECORDS=0, TaskCounter_Map_5_INPUT_date_dim, 
INPUT_RECORDS_PROCESSED=10, TaskCounter_Map_5_OUTPUT_Map_1, 
ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=1825, OUTPUT_BYTES_PHYSICAL=1505, 
OUTPUT_BYTES_WITH_OVERHEAD=7297, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=365, 
SPILLED_RECORDS=0,
.......
.......
 TaskCounter_Map_6_OUTPUT_Map_1, ADDITIONAL_SPILLS_BYTES_READ=0, 
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=909, 
OUTPUT_BYTES_PHYSICAL=464, OUTPUT_BYTES_WITH_OVERHEAD=2421, 
OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=101, SPILLED_RECORDS=0, 
TaskCounter_Map_7_INPUT_item, INPUT_RECORDS_PROCESSED=47,
 ....
 ....
 TaskCounter_Map_7_OUTPUT_Map_1, ADDITIONAL_SPILLS_BYTES_READ=0, 
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, 
OUTPUT_BYTES=1104000, OUTPUT_BYTES_PHYSICAL=341828, 
OUTPUT_BYTES_WITH_OVERHEAD=1727999, OUTPUT_LARGE_RECORDS=0, 
OUTPUT_RECORDS=48000, SPILLED_RECORDS=0, TaskCounter_Reducer_2_INPUT_Map_1, 
ADDITIONAL_SPILLS_BYTES_READ=821473, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, 
COMBINE_INPUT_RECORDS=0, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=5049, 
MERGED_MAP_OUTPUTS=36, MERGE_PHASE_TIME=5070, NUM_DISK_TO_DISK_MERGES=0, 
NUM_FAILED_SHUFFLE_INPUTS=0, NUM_MEM_TO_DISK_MERGES=0, NUM_SHUFFLED_INPUTS=36, 
NUM_SKIPPED_INPUTS=0, REDUCE_INPUT_GROUPS=47999, REDUCE_INPUT_RECORDS=670353, 
SHUFFLE_BYTES=16402510, SHUFFLE_BYTES_DECOMPRESSED=52252736, 
SHUFFLE_BYTES_DISK_DIRECT=821473, SHUFFLE_BYTES_TO_DISK=0, 
SHUFFLE_BYTES_TO_MEM=15581037, SHUFFLE_PHASE_TIME=5056, SPILLED_RECORDS=33317,
....
....
  TaskCounter_Reducer_2_OUTPUT_Reducer_3, ADDITIONAL_SPILLS_BYTES_READ=0, 
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=5600, 
OUTPUT_BYTES_PHYSICAL=0, OUTPUT_BYTES_WITH_OVERHEAD=0, OUTPUT_RECORDS=100, 
SPILLED_RECORDS=100

On Tue, Apr 28, 2015 at 4:41 AM, Xiaoyong Zhu 
<[email protected]<mailto:[email protected]>> wrote:
Btw, Rajesh, I set tez.task.generate.counters.per.io=true in my cluster but did 
not find the task counter per edge. Could you please give some counter examples 
when this is enabled so I could verify?

Thanks!

Xiaoyong

From: Rajesh Balamohan 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Friday, April 24, 2015 4:55 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: How to Tuning Tez Task Performance

Listing some details at very high level,

- Set 
"tez.task.generate.counters.per.io<http://tez.task.generate.counters.per.io>=true"
 to get more details on the task counters. Basically this starts printinng the 
counters per edge, which can be a lot more useful for debugging.

- In case you want to avoid container launches etc when you analyze for first 
time, try hive.prewarm.enabled=true & hive.prewarm.numcontainers=<no of 
containers you want in your sesssion to be prewarmed>

- Container reuse is enabled by default in tez. 
(tez.am.container.idle.release-timeout-min.millis, 
tez.am.container.idle.release-timeout-max.millis controls the amount of time a 
container is held by AM before releasing it)

- Set tez.runtime.io.sort.mb appropriately to avoid spills (you can check task 
counters in the logs to find out the spills and adjust it accordingly)

- Set tez.runtime.sort.threads=2 to enable PipelinedSorter which is a lot 
performant than DefaultSorter (this is the default in master branch. But if you 
are using earlier releases, you can turn it on by setting 
tez.runtime.sort.threads=2).

- Set tez.runtime.compress=true and set tez.runtime.compress.codec (SnappyCodec 
is preferred, but it is upto you to choose)

- Set tez.runtime.shuffle.keep-alive.enabled=true in case you have shuffle 
heavy workload. This reduces number of connections in shuffle.

- Adjust memory allocated to different inputs/outputs based on 
tez.task.scale.memory.ratios (but this is more of expert level setting which 
you might want to touch after nailing down any memory pressure)

- Adjusting shuffle buffers are also possible, but would advise only when you 
nail down an issue related to shuffle/merge codepath.

- Set "tez.runtime.optimize.local.fetch=true" to bypass http fetches (when data 
is locally present)


Feel free to refer to 
https://github.com/t3rmin4t0r/tez-autobuild/blob/master/tez-site.xml for any 
commonly used settings for benchmarks.

On Fri, Apr 24, 2015 at 1:52 PM, [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> wrote:
I want to  Tuning Tez Task Performance. This Tez Task is created by Hive.  How 
to Tuning Tez Task Performance?
Analyze performance  by Tez Task Counts  of Tez Log ? Any Suggestion?

________________________________
[email protected]<mailto:[email protected]>



--
~Rajesh.B



--
~Rajesh.B

Reply via email to