There are multiple reasons for Tez having different no. of tasks:
    - Hive itself will behave differently. With MR, it may be have been 
processing data from 2 tables in the same map stage which affects no. of tasks. 
For Tez, it may end up processing each table in a separate vertex.
    - Tez does some level of grouping of input splits to run a smaller set of 
tasks depending on configured min/max size of data processed by a task
    - Furthermore, Tez looks at the available cluster capacity to decide how 
many tasks to run for a single vertex. For example, if a cluster has capacity 
to run only 10 containers at a time, Tez will try to at max 1.7 * 10 tasks ( 
1.7 is a configurable value ). This holds true as long as the data max size 
upper bound is not crossed.

thanks
— Hitesh


On Jul 31, 2014, at 8:19 PM, igotux igotux <igo...@gmail.com> wrote:

> Thanks Hitesh. That explains the DAG.
> 
> When you said completed vs total tasks for a given vertex, does it mean, 
> there was a total of 0/2 + 0/8 = 0/10 ( 10  tasks ) for this tez job.
> Which means, when i ran the same query in hive MR, it launched 16 tasks and 
> now it is launching only 10 tasks. Also, can you please explain how the 
> number of tasks got reduced here ?
> 
> Thanks.
> 
> 
> On Thu, Jul 31, 2014 at 9:20 PM, Hitesh Shah <hit...@apache.org> wrote:
> Hi
> 
> This looks like a 3-vertex DAG. It could be possibly be a linear DAG such as 
> Map1 -> Map2 -> Reduce3 or a Join DAG where
> Map1 -> Reduce3 and Map2 -> Reduce3.
> 
> If you can get the application logs from YARN ( using bin/yarn logs 
> -applicationId application_1404180111945_438880 ), you will be able to get a 
> .dot file from the logs which will allow you to
> visualize the DAG using a tool like graphviz.
> 
> As for the console output, 0/2 or 0/8 just implies the no. of completed vs 
> total tasks for a given vertex.
> 
> thanks
> — Hitesh
> 
> 
> On Jul 31, 2014, at 12:04 AM, igotux igotux <igo...@gmail.com> wrote:
> 
> > Hello Everyone,
> >
> > Can someone help me explain what are the numbers next to Map 1 / Map 2 and 
> > Reducer 3 .
> >
> > ~~~~~~~~~~~~~~~
> > Status: Running (application id: application_1404180111945_438880)
> >
> > Map 1: -/-    Map 2: -/-      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: -/-      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 0/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 1/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 0/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 2/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 3/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 4/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 6/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 8/8      Reducer 3: 0/1
> > Map 1: 2/2    Map 2: 8/8      Reducer 3: 1/1
> > Status: Finished successfully
> > OK
> > ~~~~~~~~~~~~~~~
> >
> > The MR hive job runs with 16 mappers and one reducer.
> 
> 

Reply via email to