Mostafa Mokhtar created HIVE-10446:
--------------------------------------

             Summary: Hybrid Hybrid Grace Hash Join : 
java.lang.IllegalArgumentException in Kryo while spilling big table
                 Key: HIVE-10446
                 URL: https://issues.apache.org/jira/browse/HIVE-10446
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.0
            Reporter: Mostafa Mokhtar
            Assignee: Wei Zheng
             Fix For: 1.2.0


TPC-DS Q85 fails with Kryo exception when spilling big table data.

Query 
{code}
select  substr(r_reason_desc,1,20) as r
       ,avg(wr_return_ship_cost) wq
       ,avg(wr_refunded_cash) ref
       ,avg(wr_fee) fee
 from web_returns, customer_demographics cd1,
      customer_demographics cd2, customer_address, date_dim, reason 
 where 
   cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
   and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
   and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
   and reason.r_reason_sk = web_returns.wr_reason_sk
   and cd1.cd_marital_status = cd2.cd_marital_status
   and cd1.cd_education_status = cd2.cd_education_status
group by r_reason_desc
order by r, wq, ref, fee
limit 100
{code}

Plan 
{code}
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 
(BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
      DagName: mmokhtar_20150422165209_d8eb5634-c19f-4576-9525-cad248c7ca37:5
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: web_returns
                  filterExpr: (((wr_refunded_addr_sk is not null and 
wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and 
wr_returning_cdemo_sk is not null) (type: boolean)
                  Statistics: Num rows: 2062802370 Data size: 185695406284 
Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: (((wr_refunded_addr_sk is not null and 
wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and 
wr_returning_cdemo_sk is not null) (type: boolean)
                    Statistics: Num rows: 1875154723 Data size: 51267313780 
Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: wr_refunded_cdemo_sk (type: int), 
wr_refunded_addr_sk (type: int), wr_returning_cdemo_sk (type: int), 
wr_reason_sk (type: int), wr_fee (type: float), wr_return_ship_cost (type: 
float), wr_refunded_cash (type: float)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5, _col6
                      Statistics: Num rows: 1875154723 Data size: 51267313780 
Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col1 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0, _col2, _col3, _col4, _col5, 
_col6
                        input vertices:
                          1 Map 4
                        Statistics: Num rows: 1875154688 Data size: 45003712512 
Basic stats: COMPLETE Column stats: COMPLETE
                        HybridGraceHashJoin: true
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          keys:
                            0 _col3 (type: int)
                            1 _col0 (type: int)
                          outputColumnNames: _col0, _col2, _col4, _col5, _col6, 
_col9
                          input vertices:
                            1 Map 5
                          Statistics: Num rows: 1875154688 Data size: 
219393098496 Basic stats: COMPLETE Column stats: COMPLETE
                          HybridGraceHashJoin: true
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            keys:
                              0 _col0 (type: int)
                              1 _col0 (type: int)
                            outputColumnNames: _col2, _col4, _col5, _col6, 
_col9, _col11, _col12
                            input vertices:
                              1 Map 6
                            Statistics: Num rows: 1875154688 Data size: 
547545168896 Basic stats: COMPLETE Column stats: COMPLETE
                            HybridGraceHashJoin: true
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col2 (type: int), _col11 (type: string), 
_col12 (type: string)
                                1 _col0 (type: int), _col1 (type: string), 
_col2 (type: string)
                              outputColumnNames: _col4, _col5, _col6, _col9
                              input vertices:
                                1 Map 7
                              Statistics: Num rows: 402058172 Data size: 
43824340748 Basic stats: COMPLETE Column stats: COMPLETE
                              HybridGraceHashJoin: true
                              Select Operator
                                expressions: _col9 (type: string), _col5 (type: 
float), _col6 (type: float), _col4 (type: float)
                                outputColumnNames: _col0, _col1, _col2, _col3
                                Statistics: Num rows: 402058172 Data size: 
43824340748 Basic stats: COMPLETE Column stats: COMPLETE
                                Group By Operator
                                  aggregations: avg(_col1), avg(_col2), 
avg(_col3)
                                  keys: _col0 (type: string)
                                  mode: hash
                                  outputColumnNames: _col0, _col1, _col2, _col3
                                  Statistics: Num rows: 10975 Data size: 
1064575 Basic stats: COMPLETE Column stats: COMPLETE
                                  Reduce Output Operator
                                    key expressions: _col0 (type: string)
                                    sort order: +
                                    Map-reduce partition columns: _col0 (type: 
string)
                                    Statistics: Num rows: 10975 Data size: 
1064575 Basic stats: COMPLETE Column stats: COMPLETE
                                    value expressions: _col1 (type: 
struct<count:bigint,sum:double,input:float>), _col2 (type: 
struct<count:bigint,sum:double,input:float>), _col3 (type: 
struct<count:bigint,sum:double,input:float>)
            Execution mode: vectorized
        Map 4
            Map Operator Tree:
                TableScan
                  alias: customer_address
                  filterExpr: ca_address_sk is not null (type: boolean)
                  Statistics: Num rows: 40000000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ca_address_sk is not null (type: boolean)
                    Statistics: Num rows: 40000000 Data size: 160000000 Basic 
stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: ca_address_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 40000000 Data size: 160000000 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 40000000 Data size: 160000000 
Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Map 5
            Map Operator Tree:
                TableScan
                  alias: reason
                  filterExpr: r_reason_sk is not null (type: boolean)
                  Statistics: Num rows: 72 Data size: 14400 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: r_reason_sk is not null (type: boolean)
                    Statistics: Num rows: 72 Data size: 7272 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: r_reason_sk (type: int), r_reason_desc 
(type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 72 Data size: 7272 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 72 Data size: 7272 Basic stats: 
COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: string)
            Execution mode: vectorized
        Map 6
            Map Operator Tree:
                TableScan
                  alias: cd1
                  filterExpr: ((cd_demo_sk is not null and cd_marital_status is 
not null) and cd_education_status is not null) (type: boolean)
                  Statistics: Num rows: 1920800 Data size: 718379200 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((cd_demo_sk is not null and cd_marital_status 
is not null) and cd_education_status is not null) (type: boolean)
                    Statistics: Num rows: 1920800 Data size: 351506400 Basic 
stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cd_demo_sk (type: int), cd_marital_status 
(type: string), cd_education_status (type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 1920800 Data size: 351506400 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 1920800 Data size: 351506400 
Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: string), _col2 (type: 
string)
            Execution mode: vectorized
        Map 7
            Map Operator Tree:
                TableScan
                  alias: cd1
                  filterExpr: ((cd_demo_sk is not null and cd_marital_status is 
not null) and cd_education_status is not null) (type: boolean)
                  Statistics: Num rows: 1920800 Data size: 718379200 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((cd_demo_sk is not null and cd_marital_status 
is not null) and cd_education_status is not null) (type: boolean)
                    Statistics: Num rows: 1920800 Data size: 351506400 Basic 
stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: cd_demo_sk (type: int), cd_marital_status 
(type: string), cd_education_status (type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 1920800 Data size: 351506400 Basic 
stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: 
string), _col2 (type: string)
                        sort order: +++
                        Map-reduce partition columns: _col0 (type: int), _col1 
(type: string), _col2 (type: string)
                        Statistics: Num rows: 1920800 Data size: 351506400 
Basic stats: COMPLETE Column stats: COMPLETE
            Execution mode: vectorized
        Reducer 2
            Reduce Operator Tree:
              Group By Operator
                aggregations: avg(VALUE._col0), avg(VALUE._col1), 
avg(VALUE._col2)
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 25 Data size: 3025 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  expressions: substr(_col0, 1, 20) (type: string), _col1 
(type: double), _col2 (type: double), _col3 (type: double)
                  outputColumnNames: _col0, _col1, _col2, _col3
                  Statistics: Num rows: 25 Data size: 5200 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: _col0 (type: string), _col1 (type: 
double), _col2 (type: double), _col3 (type: double)
                    sort order: ++++
                    Statistics: Num rows: 25 Data size: 5200 Basic stats: 
COMPLETE Column stats: COMPLETE
                    TopN Hash Memory Usage: 0.04
        Reducer 3
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: string), 
KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double), 
KEY.reducesinkkey3 (type: double)
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 25 Data size: 5200 Basic stats: COMPLETE 
Column stats: COMPLETE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 25 Data size: 5200 Basic stats: 
COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 25 Data size: 5200 Basic stats: 
COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 100
      Processor Tree:
        ListSink
{code}

Exception 
{code}
], TaskAttempt 3 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
        ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 17 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: output cannot be null.
        at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:411)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:287)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
        ... 18 more
Caused by: java.lang.IllegalArgumentException: output cannot be null.
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:601)
        at 
org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer.add(ObjectContainer.java:101)
        at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.spillBigTableRow(MapJoinOperator.java:425)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:307)
        at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390)
        ... 27 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
vertex_1426707664723_3652_3_04 [Map 1] killed/failed due to:null]Vertex killed, 
vertexName=Reducer 3, vertexId=vertex_1426707664723_3652_3_06, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
other vertex failed. failedTasks:0, Vertex vertex_1426707664723_3652_3_06 
[Reducer 3] killed/failed due to:null]Vertex killed, vertexName=Reducer 2, 
vertexId=vertex_1426707664
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to