Re: Tez Counter question

Rajesh Balamohan Mon, 13 Jul 2015 02:53:39 -0700

Correct. OUTPUT_BYTES_PHYSICAL would take care of ordered/unordered
outputs. HDFS_BYTES_WRITTEN (in case of any HDFS writes).


~Rajesh.B

On Mon, Jul 13, 2015 at 2:50 PM, Xiaoyong Zhu <[email protected]>
wrote:

>  Thanks!
>
>
>
> Btw, if we want to count task/vertex output, which counter should we take
> a look at? HDFS_BYTES_WRITTEN+ OUTPUT_BYTES_PHYSICAL?
>
>
>
> Xiaoyong
>
>
>
> *From:* Rajesh Balamohan [mailto:[email protected]]
> *Sent:* Monday, July 13, 2015 1:46 PM
> *To:* [email protected]
> *Cc:* Hitesh Shah; Yifung Lin; Zhaomin Xu; Joe Zhang (SDE)
>
> *Subject:* Re: Tez Counter question
>
>
>
> For skew analysis, "SHUFFLE_BYTES (fetched from previous vertex) +
> HDFS_BYTES_READ (read from HDFS)" can be used.  Along with this,
> REDUCE_INPUT_GROUPS & REDUCE_INPUT_RECORDS could give details on data skew.
>
>
>
> For example, consider "Map 1" & "Map 7" sending output to "Reducer 2".
>
>
>
> TaskCounter_Reducer_2_INPUT_Map_1 (i.e, Reducer 2 getting input from Map 1)
>
> REDUCE_INPUT_GROUPS           271
>
> REDUCE_INPUT_RECORDS         16,084,685,867
>
> SHUFFLE_BYTES        60,903,100,935
>
>
>
> TaskCounter_Reducer_2_INPUT_Map_7 (i.e, Reducer 2 getting input from Map 7)
>
> REDUCE_INPUT_GROUPS           879
>
> REDUCE_INPUT_RECORDS         1,696
>
> SHUFFLE_BYTES       59,539
>
>
>
> In this case, it is clear that there is data skew in the input from Map 1
> to Reducer 2. Now one can drill down to "Map 1" to understand which task
> (or set of tasks) is generating most amount of data to "Reducer 2".
>
>
>
>
>
> Other points which might be useful for skew analysis
>
> 1. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is
> approximately 1.0, you can possibly increase the number of reducers for the
> vertex (if the vertex is slow).
>
>
>
> 2. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less
> than 0.2 (~20%) and if almost all the records are processed by this
> reducer, it could mean data skew.
>
>
>
> 3. In some cases, REDUCE_INPUT_GROUPS/REDUCE_INPUT_RECORDS ratio might be
> in between (i.e 0.3 - 0.8). In such cases, if most of the records are
> processed by this reducer (as compared to the overall number of records in
> the vertex), you might want to check the partition logic.
>
>
>
> ~Rajesh.B
>
>
>
>
>
> On Mon, Jul 13, 2015 at 10:49 AM, Xiaoyong Zhu <[email protected]>
> wrote:
>
> So - if we want to know if a vertex has data skew issue or not, which
> counter number should we use?
>
> Xiaoyong
>
> -----Original Message-----
> From: Hitesh Shah [mailto:[email protected]]
> Sent: Thursday, July 9, 2015 1:39 PM
> To: [email protected]
> Cc: Xiaoyong Zhu; Yifung Lin; Zhaomin Xu
> Subject: Re: Tez Counter question
>
> For data skew, you may also want to consider enabling "
> tez.task.generate.counters.per.io". This enables counters on a per edge
> basis which is more helpful for complex DAGs.
>
> - Hitesh
>
> On Jul 8, 2015, at 10:29 PM, Joe Zhang (SDE) <[email protected]> wrote:
>
> > Hi Rajesh:
> >
> > Thanks for your reply. I want to know more detail , see inline
> >
> > Sorry for that I don't explain why I am so care about those counter. I
> am trying to analysis the data skew issue for tez vertex . Now I can get
> several related counter value including FILE_BYTES_READ, HDFS_BYTES_READ,
> SHUFFLE_BYTES and so on. So I want to know which counter value is
> meaningful for analyzing data skew ?
> >
> > Best wishes
> > Joe zhang
> >
> > From: Rajesh Balamohan [mailto:[email protected]]
> > Sent: Wednesday, July 8, 2015 4:57 PM
> > To: [email protected]
> > Cc: Xiaoyong Zhu; Yifung Lin
> > Subject: Re: Tez Counter question
> >
> > FILE_BYTES_READ - Represents the data read from local disk
> > >>>>>>>>>>Joezhang : when or in which case mapper or reducer vertex need
> read from local disk or write to local disk ? I am wondering why reducer in
> tez has the data both read from local disk and shuffle from parent node, as
> far as I know, the traditional reducer in MR1 only read shuffle data(In
> memory and shuffle local disk), does tez engine did some optimizations for
> this ?
> >
> > HDFS_BYTES_READ - Represents data read from HDFS (does not include
> > data read from disk) ;>>>>>>>>>>Joezhang : when or in which case mapper
> or reducer vertex need read from hdfs or write tp hdfs?
> >
> > SHUFFLE_BYTES - Represents the data that was transferred over the wire
> while doing shuffle. Downloaded data either gets into memory or disk
> (depending on memory availability). So, SHUFFLE_BYTES_TO_MEM and
> SHUFFLE_BYTES_TO_DISK would have correlation with SHUFFLE_BYTES.  This does
> not have direct relationship with FILE_BYTES_READ. However, in case of
> spills & merge, FILES_BYTES_READ can be incremented correspondingly.
> >
> > ~Rajesh.B
> >
> > On Wed, Jul 8, 2015 at 1:25 PM, Joe Zhang (SDE) <[email protected]>
> wrote:
> > HI Tez experts:
> >
> > Now I am using Tez Rest API to get tez tasks running Info, but I am
> > confusing some concepts in Counter
> >
> > <1>  For File system counters:
> >
> > counterName : FILE_BYTES_READ ? does it mean read from local disk or
> somewhere else ?
> >
> >                                      HDFS_BYTES_READ ?  is it included
> by FILE_BYTES_READ ?
> >
> > <2>  For org.apache.tez.common.counters.TaskCounter:
> >
> > counterName SHUFFLE_BYTES ? does it have some relationship with
> FILE_BYTES_READ ? which data should be included in it ?
> >
> > Best wishes
> > Joe zhang
>
>
>

Re: Tez Counter question

Reply via email to