Hi all,

 

We encountered a strange behavior after upgrading to HIVE 2.0.0 + TEZ 0.8.2.


 

I simplified our query to this:

 

SELECT

  a.key,

  a.a_one,

  b.b_one,

  a.a_zero,

  b.b_zero

FROM

(

    SELECT

      11 key,

      0 confuse_you,

      1 a_one,

      0 a_zero

) a

LEFT JOIN 

(

    SELECT

      11 key,

      0 confuse_you,

      1 b_one,

      0 b_zero

) b 

ON a.key = b.key

;

 

 

Above query generates this unexpected result:

 

INFO  : Status: Running (Executing on YARN cluster with App id
application_1456723490535_3653)

 

INFO  : Map 1: 0/1      Map 2: 0/1

INFO  : Map 1: 0/1      Map 2: 0(+1)/1

INFO  : Map 1: 0(+1)/1  Map 2: 0(+1)/1

INFO  : Map 1: 0(+1)/1  Map 2: 1/1

INFO  : Map 1: 1/1      Map 2: 1/1

INFO  : Completed executing
command(queryId=hive_20160301115630_0a0dbee5-ba4b-45e7-b027-085f655640fd);
Time taken: 10.225 seconds

INFO  : OK

+--------+----------+----------+-----------+-----------+--+

| a.key  | a.a_one  | b.b_one  | a.a_zero  | b.b_zero  |

+--------+----------+----------+-----------+-----------+--+

| 11     | 1        | 0        | 0         | 1         |

+--------+----------+----------+-----------+-----------+--+

 

If you change the constant value of subquery-b's confuse_you column from 0
to 2, the problem disappears. The plan returned from EXPLAIN shows the
incorrect one is picking _col1 and _col2, while the correct one is picking
_col2 and _col3 form sub query b.

 

Seems it cannot distinguish 2 columns with same constant value?

 

 

Anyone encountered similar problem?

 

 

Thanks!

 

Chi

 

Reply via email to