if you look closely in first phase it executes your transform and in second
it does your sum operation


On Wed, Jan 23, 2013 at 11:24 AM, Richard <codemon...@163.com> wrote:

> thanks. I used explain command and get the plan, but I am still confused.
> The below is the description of two map-reduce stages:
>
> it seems that in stage-1 the aggregation has already been done, why
> stage-2 has aggregation again?
>
>
> ==========================
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         a:t1
>           TableScan
>             alias: t1
>             Select Operator
>               expressions:
>          &nbs p;          expr: f
>                     type: string
>               outputColumnNames: _col0
>               Transform Operator
>                 command: mymapper
>                 output info:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                  Select Operator
>                   expressions:
>                         expr: _col0
>                         type: string
>                         expr: _col1
>                         type: string
>                         expr: _col2
> &n bsp;                       type: string
>                   outputColumnNames: _col0, _col1, _col2
>                   Group By Operator
>                     aggregations:
>                           expr: sum(_col0)
>                           expr: sum(_col1)
>                 &nbsp ;   bucketGroup: false
>                     keys:
>                           expr: _col2
>                           type: string
>                     mode: hash
>                     outputColumnNames: _col0, _col1, _col2
>                     Reduce Output Operator
>             &nbsp ;         key expressions:
>                             expr: _col0
>                             type: string
>                       sort order: +
>                       Map-reduce partition columns:
>                             expr: rand()
>               &nb sp;             type: double
>                       tag: -1
>                       value expressions:
>                             expr: _col1
>                             type: double
>                             expr: _col2
>              &nbsp ;              type: double
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations:
>                 expr: sum(VALUE._col0)
>                 expr: sum(VALUE._col1)
>           bucketGroup: false
>           keys:
>                 expr: KEY._col0
>                 type: string
>           mode: partials
>        &n bsp;  outputColumnNames: _col0, _col1, _col2
>           File Output Operator
>             compressed: false
>             GlobalTableId: 0
>             table:
>                 input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                 output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>
>   Stage: Stage-2
>     Map Reduce
>       Alias -> Map Operator Tree:
>
> hdfs://hdpnn:9000/mydata/hive/hive_2013-01-23_13-46-09_628_5487089660360786955/10002
>     & nbsp;       Reduce Output Operator
>               key expressions:
>                     expr: _col0
>                     type: string
>               sort order: +
>               Map-reduce partition columns:
>                     expr: _col0
>                     type: string
>               tag: -1
>  &nbs p;            value expressions:
>                     expr: _col1
>                     type: double
>                     expr: _col2
>                     type: double
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations:
>                 expr: sum(VALUE._col0)
>             &nb sp;   expr: sum(VALUE._col1)
>           bucketGroup: false
>           keys:
>                 expr: KEY._col0
>                 type: string
>           mode: final
>           outputColumnNames: _col0, _col1, _col2
>           Select Operator
>             expressions:
>                   expr: _col1
>                   type: double
>                    expr: _col2
>                   type: double
>                   expr: _col0
>                   type: string
>             outputColumnNames: _col0, _col1, _col2
>             File Output Operator
>               compressed: false
>               GlobalTableId: 0
>               table:
>            &nbsp ;      input format:
> org.apache.hadoop.mapred.TextInputFormat
>                   output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> ============================
>
>
>
>
>
>
> At 2013-01-23 11:45:13,Richard <codemon...@163.com> wrote:
>
> I am wondering how to determine the number of map-reduce for a hive query.
>
> for example, the following query
>
> select
> sum(c1),
> sum(c2),
> k1
> from
> {
> select transform(*) using 'mymapper'  as c1, c2, k1
> from t 1
> } a group by k1;
>
> when i run this query, it takes two map-reduce, but I expect it to take
> only 1.
> in the map stage, using 'mymapper' as the mapper, then shuffle the mapper
> output by k1 and perform sum reduce in the reducer.
>
> so why hive takes 2 map-reduce?
>
>
>
>
>


-- 
Nitin Pawar

Reply via email to