Seonggon Namgung created HIVE-27006: ---------------------------------------
Summary: ParallelEdgeFixer inserts misconfigured operator and does not connect it in Tez DAG Key: HIVE-27006 URL: https://issues.apache.org/jira/browse/HIVE-27006 Project: Hive Issue Type: Bug Reporter: Seonggon Namgung Assignee: Seonggon Namgung Attachments: after.PEF.png, tez-dag.png Hive fails to run the below query on 1TB ORC formatted TPC-DS dataset because of runtime error happens in one Operator. I found that the problematic operator is inserted by ParallelEdgeFixer. Also I observed that the corresponding vertex has no descendant vertex although its ReduceSinkOperator has a SemiJoin edge connected to TableScanOperator. (I attached the figure of Tez DAG and OperatorGraph. One can check that Cluster6 and Cluster7 are connected while Reducer4 and Map7 are not.) Query {code:java} set hive.optimize.shared.work=true; set hive.optimize.shared.work.parallel.edge.support=true; with inv00 as (select inv_item_sk, inv_warehouse_sk from inventory, date_dim where inv_date_sk = d_date_sk and d_year = 2000), inv01 as (select inv_item_sk, inv_warehouse_sk from inventory, date_dim where inv_date_sk = d_date_sk and d_year = 2001), inv02 as (select inv_item_sk, inv_warehouse_sk from inventory, date_dim where inv_date_sk = d_date_sk and d_year = 2002), sd00 as (select inv_item_sk id, w_zip zip from inv00 full outer join warehouse on inv_warehouse_sk = w_warehouse_sk where w_state = 'SD'), sd01 as (select inv_item_sk id, w_zip zip from inv01 full outer join warehouse on inv_warehouse_sk = w_warehouse_sk where w_state = 'SD'), sd02 as (select inv_item_sk id, w_zip zip from inv02 full outer join warehouse on inv_warehouse_sk = w_warehouse_sk where w_state = 'SD') select * from sd00, sd01, sd02 where sd00.id = sd01.id and sd00.id = sd02.id; {code} Error message {code:java} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:385) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:301) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col0 from [] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:384) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) ... 19 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:153) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:305) ... 22 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)