So - if we want to know if a vertex has data skew issue or not, which counter 
number should we use?

Xiaoyong

-----Original Message-----
From: Hitesh Shah [mailto:[email protected]] 
Sent: Thursday, July 9, 2015 1:39 PM
To: [email protected]
Cc: Xiaoyong Zhu; Yifung Lin; Zhaomin Xu
Subject: Re: Tez Counter question

For data skew, you may also want to consider enabling 
"tez.task.generate.counters.per.io". This enables counters on a per edge basis 
which is more helpful for complex DAGs.

- Hitesh

On Jul 8, 2015, at 10:29 PM, Joe Zhang (SDE) <[email protected]> wrote:

> Hi Rajesh:
>  
> Thanks for your reply. I want to know more detail , see inline
>  
> Sorry for that I don't explain why I am so care about those counter. I am 
> trying to analysis the data skew issue for tez vertex . Now I can get several 
> related counter value including FILE_BYTES_READ, HDFS_BYTES_READ, 
> SHUFFLE_BYTES and so on. So I want to know which counter value is meaningful 
> for analyzing data skew ?
>  
> Best wishes
> Joe zhang
>  
> From: Rajesh Balamohan [mailto:[email protected]]
> Sent: Wednesday, July 8, 2015 4:57 PM
> To: [email protected]
> Cc: Xiaoyong Zhu; Yifung Lin
> Subject: Re: Tez Counter question
>  
> FILE_BYTES_READ - Represents the data read from local disk
> >>>>>>>>>>Joezhang : when or in which case mapper or reducer vertex need read 
> >>>>>>>>>>from local disk or write to local disk ? I am wondering why reducer 
> >>>>>>>>>>in tez has the data both read from local disk and shuffle from 
> >>>>>>>>>>parent node, as far as I know, the traditional reducer in MR1 only 
> >>>>>>>>>>read shuffle data(In memory and shuffle local disk), does tez 
> >>>>>>>>>>engine did some optimizations for this ?
>  
> HDFS_BYTES_READ - Represents data read from HDFS (does not include 
> data read from disk) ;>>>>>>>>>>Joezhang : when or in which case mapper or 
> reducer vertex need read from hdfs or write tp hdfs?
>  
> SHUFFLE_BYTES - Represents the data that was transferred over the wire while 
> doing shuffle. Downloaded data either gets into memory or disk (depending on 
> memory availability). So, SHUFFLE_BYTES_TO_MEM and SHUFFLE_BYTES_TO_DISK 
> would have correlation with SHUFFLE_BYTES.  This does not have direct 
> relationship with FILE_BYTES_READ. However, in case of spills & merge, 
> FILES_BYTES_READ can be incremented correspondingly. 
>  
> ~Rajesh.B
>  
> On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) <[email protected]> wrote:
> HI Tez experts:
>  
> Now I am using Tez Rest API to get tez tasks running Info, but I am 
> confusing some concepts in Counter
>  
> <1>  For File system counters:
>         
> counterName : FILE_BYTES_READ ? does it mean read from local disk or 
> somewhere else ?
>                            
>                                      HDFS_BYTES_READ ?  is it included by 
> FILE_BYTES_READ ?
>  
> <2>  For org.apache.tez.common.counters.TaskCounter:
>  
> counterName SHUFFLE_BYTES ? does it have some relationship with 
> FILE_BYTES_READ ? which data should be included in it ?
>  
> Best wishes
> Joe zhang

Reply via email to