Hi Rajesh:

Thanks for your reply. I want to know more detail , see inline

Sorry for that I don’t explain why I am so care about those counter. I am 
trying to analysis the data skew issue for tez vertex . Now I can get several 
related counter value including FILE_BYTES_READ, HDFS_BYTES_READ, SHUFFLE_BYTES 
and so on. So I want to know which counter value is meaningful for analyzing 
data skew ?

Best wishes
Joe zhang

From: Rajesh Balamohan [mailto:[email protected]]
Sent: Wednesday, July 8, 2015 4:57 PM
To: [email protected]
Cc: Xiaoyong Zhu; Yifung Lin
Subject: Re: Tez Counter question

FILE_BYTES_READ - Represents the data read from local disk
>>>>>>>>>>Joezhang : when or in which case mapper or reducer vertex need read 
>>>>>>>>>>from local disk or write to local disk ? I am wondering why reducer 
>>>>>>>>>>in tez has the data both read from local disk and shuffle from parent 
>>>>>>>>>>node, as far as I know, the traditional reducer in MR1 only read 
>>>>>>>>>>shuffle data(In memory and shuffle local disk), does tez engine did 
>>>>>>>>>>some optimizations for this ?

HDFS_BYTES_READ - Represents data read from HDFS (does not include data read 
from disk)
;>>>>>>>>>>Joezhang : when or in which case mapper or reducer vertex need read 
from hdfs or write tp hdfs?

SHUFFLE_BYTES - Represents the data that was transferred over the wire while 
doing shuffle. Downloaded data either gets into memory or disk (depending on 
memory availability). So, SHUFFLE_BYTES_TO_MEM and SHUFFLE_BYTES_TO_DISK would 
have correlation with SHUFFLE_BYTES.  This does not have direct relationship 
with FILE_BYTES_READ. However, in case of spills & merge, FILES_BYTES_READ can 
be incremented correspondingly.

~Rajesh.B

On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) 
<[email protected]<mailto:[email protected]>> wrote:
HI Tez experts:

Now I am using Tez Rest API to get tez tasks running Info, but I am confusing 
some concepts in Counter

<1>  For File system counters:

counterName : FILE_BYTES_READ ? does it mean read from local disk or somewhere 
else ?

                                     HDFS_BYTES_READ ?  is it included by 
FILE_BYTES_READ ?

<2>  For org.apache.tez.common.counters.TaskCounter:

counterName SHUFFLE_BYTES ? does it have some relationship with FILE_BYTES_READ 
? which data should be included in it ?

Best wishes
Joe zhang


Reply via email to