Thanks Rajesh. Then what’s the total amount of data for a certain task or vertex to read, if we want to count? HDFS_BYTES_READ(which indicates the data read from “global file system”) + SHUFFLE_BYTES (which indicates the data read from “upper task/vertex”)?
Xiaoyong From: Rajesh Balamohan [mailto:[email protected]] Sent: Wednesday, July 8, 2015 4:57 PM To: [email protected] Cc: Xiaoyong Zhu; Yifung Lin Subject: Re: Tez Counter question FILE_BYTES_READ - Represents the data read from local disk HDFS_BYTES_READ - Represents data read from HDFS (does not include data read from disk) SHUFFLE_BYTES - Represents the data that was transferred over the wire while doing shuffle. Downloaded data either gets into memory or disk (depending on memory availability). So, SHUFFLE_BYTES_TO_MEM and SHUFFLE_BYTES_TO_DISK would have correlation with SHUFFLE_BYTES. This does not have direct relationship with FILE_BYTES_READ. However, in case of spills & merge, FILES_BYTES_READ can be incremented correspondingly. ~Rajesh.B On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) <[email protected]<mailto:[email protected]>> wrote: HI Tez experts: Now I am using Tez Rest API to get tez tasks running Info, but I am confusing some concepts in Counter <1> For File system counters: counterName : FILE_BYTES_READ ? does it mean read from local disk or somewhere else ? HDFS_BYTES_READ ? is it included by FILE_BYTES_READ ? <2> For org.apache.tez.common.counters.TaskCounter: counterName SHUFFLE_BYTES ? does it have some relationship with FILE_BYTES_READ ? which data should be included in it ? Best wishes Joe zhang
