[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan resolved HIVE-9146. ------------------------------------ Resolution: Fixed Fix Version/s: 0.14.0 Assignee: Ashutosh Chauhan Fixed via HIVE-8298 > Query with left joins produces wrong result when join condition is written in > different order > --------------------------------------------------------------------------------------------- > > Key: HIVE-9146 > URL: https://issues.apache.org/jira/browse/HIVE-9146 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer > Affects Versions: 0.13.1 > Reporter: Kamil Gorlo > Assignee: Ashutosh Chauhan > Fix For: 0.14.0 > > > I have two queries which should be equal (I only swap two join conditions) > but they are not. They are simplest queries I could produce to reproduce bug. > I have two simple tables: > desc kgorlo_comm; > | col_name | data_type | comment | > | id | bigint | | > | dest_id | bigint | | > desc kgorlo_log; > | col_name | data_type | comment | > | id | bigint | | > | dest_id | bigint | | > | tstamp | bigint | | > With data: > select * from kgorlo_comm; > | kgorlo_comm.id | kgorlo_comm.dest_id | > | 1 | 2 | > | 2 | 1 | > | 1 | 3 | > | 2 | 3 | > | 3 | 5 | > | 4 | 5 | > select * from kgorlo_log; > | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | > | 1 | 2 | 0 | > | 1 | 3 | 0 | > | 1 | 5 | 0 | > | 3 | 1 | 0 | > And when I run this query (query no. 1): > {quote} > select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log > left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm > group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id > left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm > group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; > {quote} > I get result (which is correct): > | log.id | log.dest_id | com1.msgs | com2.msgs | > | 1 | 2 | 1 | 1 | > | 1 | 3 | 1 | NULL | > | 1 | 5 | NULL | NULL | > | 3 | 1 | NULL | 1 | > But when I run second query (query no. 2): > {quote} > select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log > left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm > group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id > left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm > group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; > {quote} > I get different (and bad, in my opinion) result: > |log.id | log.dest_id | com1.msgs | com2.msgs| > |1|2|1|1| > |1|3|1|1| > |1|5|NULL|NULL| > |3|1|NULL|NULL| > Query no. 1 and query no. 2 are different in only one place, it is second > join condition: > bf. com2.dest_id=log.id and com2.id=log.dest_id > vs > bf. com2.id=log.dest_id and com2.dest_id=log.id > which in my opinion are equal. > Explains for both queries are of course slightly different (columns are > swapped) and they are here: > https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) > https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)