This looks like a real bug. Matthew might know if there’s already a fix or a 
ticket, otherwise you should open a JIRA.

From: Ted Xu <frank...@gmail.com<mailto:frank...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Friday, August 14, 2015 at 03:56
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: MapJoin bug?

Hi all,

I was doing TPC-H benchmark on Hive recently while I found some queries went 
wrong.

Following are the two cases, both are MapJoin while the join key is bigint 
type. After disabling auto convert join the error is gone.

Case 1.
Query (TPC-H query4):

create table q4_result as
select
o_orderpriority,
count(*) as order_count
from
orders o
join
(
select
distinct l_orderkey
from
(
select
*
from
lineitem
where
l_commitdate < l_receiptdate
) tab1
) tab2
on tab2.l_orderkey = o.o_orderkey
where
o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01'
group by
o_orderpriority
order by
o_orderpriority;

The query will cause data-loss if MapJoin is enabled. Both side of join have 
expected output but some data can't be joined together here. (Note l_orderkey & 
o_orderkey is bigint).

Case 2:
Query (TPC-H query9):

create table q9_result as
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
substr(o_orderdate,1,4) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
supplier s
join lineitem l on s.s_suppkey = l.l_suppkey
join partsupp ps on ps.ps_suppkey = l.l_suppkey and ps.ps_partkey = l.l_partkey
join part p on p.p_partkey = l.l_partkey
join orders o on o.o_orderkey = l.l_orderkey
join nation n on s.s_nationkey = n.n_nationkey
where
p_name like '%green%'
) profit
group by
nation,
o_year
order by
nation,
o_year desc;


The error is when joining table s and n, we got an exception as follows:

Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
        ... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:368)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:117)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
        ... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:403)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:98)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:603)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:362)
        ... 27 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:301)
        at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:244)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:196)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:542)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:386)
        ... 31 more


Both the 2 cases use MapJoin and bigint as join key, under tez with vectorized 
execution enabled, so I'm wondering if there is bug on 
VectorMapJoinInnerLongOperator.

Does anyone have ideas?

Reply via email to