Hi all,
We encountered a strange behavior after upgrading to HIVE 2.0.0 + TEZ 0.8.2.
I simplified our query to this:
SELECT
a.key,
a.a_one,
b.b_one,
a.a_zero,
b.b_zero
FROM
(
SELECT
11 key,
0 confuse_you,
1 a_one,
0 a_zero
) a
LEFT JOIN
(
SELECT
11 key,
0 confuse_you,
1 b_one,
0 b_zero
) b
ON a.key = b.key
;
Above query generates this unexpected result:
INFO : Status: Running (Executing on YARN cluster with App id
application_1456723490535_3653)
INFO : Map 1: 0/1 Map 2: 0/1
INFO : Map 1: 0/1 Map 2: 0(+1)/1
INFO : Map 1: 0(+1)/1 Map 2: 0(+1)/1
INFO : Map 1: 0(+1)/1 Map 2: 1/1
INFO : Map 1: 1/1 Map 2: 1/1
INFO : Completed executing
command(queryId=hive_20160301115630_0a0dbee5-ba4b-45e7-b027-085f655640fd);
Time taken: 10.225 seconds
INFO : OK
+--------+----------+----------+-----------+-----------+--+
| a.key | a.a_one | b.b_one | a.a_zero | b.b_zero |
+--------+----------+----------+-----------+-----------+--+
| 11 | 1 | 0 | 0 | 1 |
+--------+----------+----------+-----------+-----------+--+
If you change the constant value of subquery-b's confuse_you column from 0
to 2, the problem disappears. The plan returned from EXPLAIN shows the
incorrect one is picking _col1 and _col2, while the correct one is picking
_col2 and _col3 form sub query b.
Seems it cannot distinguish 2 columns with same constant value?
Anyone encountered similar problem?
Thanks!
Chi