+ hive mail list, more likely hive issue.
---------- Forwarded message ---------- From: GAO Chi <chi....@microfun.com> Date: Tue, Mar 1, 2016 at 12:24 PM Subject: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join To: u...@tez.apache.org Hi all, We encountered a strange behavior after upgrading to HIVE 2.0.0 + TEZ 0.8.2. I simplified our query to this: SELECT a.key, a.a_one, b.b_one, a.a_zero, b.b_zero FROM ( SELECT 11 key, 0 confuse_you, 1 a_one, 0 a_zero ) a LEFT JOIN ( SELECT 11 key, 0 confuse_you, 1 b_one, 0 b_zero ) b ON a.key = b.key ; Above query generates this unexpected result: INFO : Status: Running (Executing on YARN cluster with App id application_1456723490535_3653) INFO : Map 1: 0/1 Map 2: 0/1 INFO : Map 1: 0/1 Map 2: 0(+1)/1 INFO : Map 1: 0(+1)/1 Map 2: 0(+1)/1 INFO : Map 1: 0(+1)/1 Map 2: 1/1 INFO : Map 1: 1/1 Map 2: 1/1 INFO : Completed executing command(queryId=hive_20160301115630_0a0dbee5-ba4b-45e7-b027-085f655640fd); Time taken: 10.225 seconds INFO : OK +--------+----------+----------+-----------+-----------+--+ | a.key | a.a_one | b.b_one | a.a_zero | b.b_zero | +--------+----------+----------+-----------+-----------+--+ | 11 | 1 | 0 | 0 | 1 | +--------+----------+----------+-----------+-----------+--+ If you change the constant value of subquery-b’s confuse_you column from 0 to 2, the problem disappears. The plan returned from EXPLAIN shows the incorrect one is picking _col1 and _col2, while the correct one is picking _col2 and _col3 form sub query b. Seems it cannot distinguish 2 columns with same constant value? Anyone encountered similar problem? Thanks! Chi -- Best Regards Jeff Zhang