Gopal V created HIVE-16852:
------------------------------

             Summary: PTF: RANK() re-evaluates order predicates on the reducer
                 Key: HIVE-16852
                 URL: https://issues.apache.org/jira/browse/HIVE-16852
             Project: Hive
          Issue Type: Bug
          Components: Physical Optimizer
    Affects Versions: 2.1.1, 3.0.0
            Reporter: Gopal V


{code}
explain select ss_item_sk, rank() over(order by cast(ss_list_price as 
decimal(38,10))) as r , ss_list_price from store_sales;

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: root_20170608015140_7b0debb9-b14b-4150-b004-9743c6127392:3
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName:
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  Statistics: Num rows: 28800426268 Data size: 450435120648 
Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: 0 (type: int), CAST( ss_list_price AS 
decimal(38,10)) (type: decimal(38,10))
                    sort order: ++
                    Map-reduce partition columns: 0 (type: int)
                    Statistics: Num rows: 28800426268 Data size: 450435120648 
Basic stats: COMPLETE Column stats: COMPLETE
                    value expressions: ss_item_sk (type: bigint), ss_list_price 
(type: double)
            Execution mode: vectorized, llap
            LLAP IO: all inputs
        Reducer 2 
            Execution mode: llap
            Reduce Operator Tree:
              Select Operator
                expressions: VALUE._col1 (type: bigint), VALUE._col11 (type: 
double)
                outputColumnNames: _col1, _col11
                Statistics: Num rows: 28800426268 Data size: 8399352770616 
Basic stats: COMPLETE Column stats: COMPLETE
                PTF Operator
                  Function definitions:
                      Input definition
                        input alias: ptf_0
                        output shape: _col1: bigint, _col11: double
                        type: WINDOWING
                      Windowing table definition
                        input alias: ptf_1
                        name: windowingtablefunction
                        order by: CAST( _col11 AS decimal(38,10)) ASC NULLS 
FIRST
                        partition by: 0
                        raw input shape:
                        window functions:
                            window function definition
                              alias: rank_window_0
                              arguments: CAST( _col11 AS decimal(38,10))
                              name: rank
                              window function: GenericUDAFRankEvaluator
                              window frame: PRECEDING(MAX)~FOLLOWING(MAX)
                              isPivotResult: true
                  Statistics: Num rows: 28800426268 Data size: 8399352770616 
Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: _col1 (type: bigint), rank_window_0 (type: 
int), _col11 (type: double)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 28800426268 Data size: 565636825720 
Basic stats: COMPLETE Column stats: COMPLETE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 28800426268 Data size: 565636825720 
Basic stats: COMPLETE Column stats: COMPLETE
                      table:
                          input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}

This forces the Decimal cast to be evaluated ~2x - once to produce the KEY 
expression and once within the window function.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to