Thanks, however I thought it again and get a little bit more confused here…so 
basically we want to get the total amount of data a certain task reads – then 
what should we add here? FILE_BYTES_READ + HDFS_BYTES_READ? Or HDFS_BYTES_READ 
+ SHUFFLE_BYTES? Or the sum of all those three counters (FILE_BYTES_READ + 
HDFS_BYTES_READ+ SHUFFLE_BYTES)?

Thanks!
Xiaoyong

From: Rajesh Balamohan [mailto:[email protected]]
Sent: Wednesday, July 8, 2015 5:15 PM
To: [email protected]
Subject: Re: Tez Counter question

Correct.  In case processor chooses to read some additional data from HDFS (as 
a part of some processing), that would also be account for in HDFS_BYTES_READ.

~Rajesh.B

On Wed, Jul 8, 2015 at 2:34 PM, Xiaoyong Zhu 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Rajesh. Then what’s the total amount of data for a certain task or 
vertex to read, if we want to count? HDFS_BYTES_READ(which indicates the data 
read from “global file system”) + SHUFFLE_BYTES (which indicates the data read 
from “upper task/vertex”)?

Xiaoyong

From: Rajesh Balamohan 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, July 8, 2015 4:57 PM
To: [email protected]<mailto:[email protected]>
Cc: Xiaoyong Zhu; Yifung Lin
Subject: Re: Tez Counter question

FILE_BYTES_READ - Represents the data read from local disk

HDFS_BYTES_READ - Represents data read from HDFS (does not include data read 
from disk)

SHUFFLE_BYTES - Represents the data that was transferred over the wire while 
doing shuffle. Downloaded data either gets into memory or disk (depending on 
memory availability). So, SHUFFLE_BYTES_TO_MEM and SHUFFLE_BYTES_TO_DISK would 
have correlation with SHUFFLE_BYTES.  This does not have direct relationship 
with FILE_BYTES_READ. However, in case of spills & merge, FILES_BYTES_READ can 
be incremented correspondingly.

~Rajesh.B

On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) 
<[email protected]<mailto:[email protected]>> wrote:
HI Tez experts:

Now I am using Tez Rest API to get tez tasks running Info, but I am confusing 
some concepts in Counter

<1>  For File system counters:

counterName : FILE_BYTES_READ ? does it mean read from local disk or somewhere 
else ?

                                     HDFS_BYTES_READ ?  is it included by 
FILE_BYTES_READ ?

<2>  For org.apache.tez.common.counters.TaskCounter:

counterName SHUFFLE_BYTES ? does it have some relationship with FILE_BYTES_READ 
? which data should be included in it ?

Best wishes
Joe zhang



Reply via email to