Thanks Rajesh. Then what’s the total amount of data for a certain task or 
vertex to read, if we want to count? HDFS_BYTES_READ(which indicates the data 
read from “global file system”) + SHUFFLE_BYTES (which indicates the data read 
from “upper task/vertex”)?

Xiaoyong

From: Rajesh Balamohan [mailto:[email protected]]
Sent: Wednesday, July 8, 2015 4:57 PM
To: [email protected]
Cc: Xiaoyong Zhu; Yifung Lin
Subject: Re: Tez Counter question

FILE_BYTES_READ - Represents the data read from local disk

HDFS_BYTES_READ - Represents data read from HDFS (does not include data read 
from disk)

SHUFFLE_BYTES - Represents the data that was transferred over the wire while 
doing shuffle. Downloaded data either gets into memory or disk (depending on 
memory availability). So, SHUFFLE_BYTES_TO_MEM and SHUFFLE_BYTES_TO_DISK would 
have correlation with SHUFFLE_BYTES.  This does not have direct relationship 
with FILE_BYTES_READ. However, in case of spills & merge, FILES_BYTES_READ can 
be incremented correspondingly.

~Rajesh.B

On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) 
<[email protected]<mailto:[email protected]>> wrote:
HI Tez experts:

Now I am using Tez Rest API to get tez tasks running Info, but I am confusing 
some concepts in Counter

<1>  For File system counters:

counterName : FILE_BYTES_READ ? does it mean read from local disk or somewhere 
else ?

                                     HDFS_BYTES_READ ?  is it included by 
FILE_BYTES_READ ?

<2>  For org.apache.tez.common.counters.TaskCounter:

counterName SHUFFLE_BYTES ? does it have some relationship with FILE_BYTES_READ 
? which data should be included in it ?

Best wishes
Joe zhang


Reply via email to