I created https://issues.apache.org/jira/browse/HIVE-13693 to track the issue.

I have already submitted a fix.

Thanks,
Jesús


From: Frank Luo <j...@merkleinc.com<mailto:j...@merkleinc.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Wednesday, May 4, 2016 at 1:04 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Cc: Rebecca Yang <yiy...@merkleinc.com<mailto:yiy...@merkleinc.com>>
Subject: RE: multiple selects on a left join give incorrect result

Hdp 2.2 also works fine.

From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Tuesday, May 03, 2016 6:56 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Cc: Rebecca Yang <yiy...@merkleinc.com<mailto:yiy...@merkleinc.com>>
Subject: RE: multiple selects on a left join give incorrect result

There is no issue on Cloudera VM

Dudu


[cloudera@quickstart ~]$ hadoop version
Hadoop 2.6.0-cdh5.5.0
Subversion http://github.com/cloudera/hadoop -r 
fd21232cef7b8c1f536965897ce20f50b83ee7b2
Compiled by jenkins on 2015-11-09T20:37Z
Compiled with protoc 2.5.0
From source with checksum 98e07176d1787150a6a9c087627562c
This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.5.0.jar

[cloudera@quickstart ~]$ hive --version
Hive 1.1.0-cdh5.5.0
Subversion 
file:///data/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hive-1.1.0-cdh5.5.0<file:///\\data\jenkins\workspace\generic-package-rhel64-6-0\topdir\BUILD\hive-1.1.0-cdh5.5.0>
 -r Unknown
Compiled by jenkins on Mon Nov 9 12:37:34 PST 2015
From source with checksum 8dfc2aac3731e4e5f0e8bd1b442be0e2

From: Frank Luo [mailto:j...@merkleinc.com]
Sent: Wednesday, May 04, 2016 1:58 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Cc: Rebecca Yang <yiy...@merkleinc.com<mailto:yiy...@merkleinc.com>>
Subject: multiple selects on a left join give incorrect result

All,

I have found that when doing a multiple selects on a left join, the “on” clause 
seems to be ignored!!! (It is hard to believe).

Below is a very simple test case and please tell me I am crazy. I am on hdp 
2.3.4.7.


CREATE TABLE T_A ( id    STRING, val   STRING );
CREATE TABLE T_B ( id    STRING, val   STRING );
CREATE TABLE join_result_1 ( ida    STRING, vala   STRING, idb    STRING, valb  
 STRING );
CREATE TABLE join_result_2 ( ida    STRING, vala   STRING, idb    STRING, valb  
 STRING );
CREATE TABLE join_result_3 ( ida    STRING, vala   STRING, idb    STRING, valb  
 STRING );

INSERT INTO TABLE T_A
VALUES ('Id_1', 'val_101'), ('Id_2', 'val_102'), ('Id_3', 'val_103');

INSERT INTO TABLE T_B
VALUES ('Id_1', 'val_103'), ('Id_2', 'val_104');

FROM T_A a LEFT JOIN T_B b ON a.id = b.id
INSERT OVERWRITE TABLE join_result_1
   SELECT a.*, b.*
    WHERE b.id = 'Id_1' AND b.val = 'val_103'
INSERT OVERWRITE TABLE join_result_2
   SELECT a.*, b.*
    WHERE b.val IS NULL OR (b.id = 'Id_3' AND b.val = 'val_101')
INSERT OVERWRITE TABLE join_result_3
   SELECT a.*, b.*
    WHERE b.val = 'val_104' AND b.id = 'Id_2' AND a.val <> b.val;


And here is the result:

0: jdbc:hive2 > select * from join_result_1;
+--------------------+---------------------+--------------------+---------------------+--+
| join_result_1.ida  | join_result_1.vala  | join_result_1.idb  | 
join_result_1.valb  |
+--------------------+---------------------+--------------------+---------------------+--+
| Id_1               | val_101             | Id_1               | val_103       
      |
| Id_2               | val_102             | Id_1               | val_103       
      |
| Id_3               | val_103             | Id_1               | val_103       
      |
+--------------------+---------------------+--------------------+---------------------+--+
3 rows selected (0.057 seconds)




I am expecting join_result_1 to have one row, but got three!!!

Has other people run into the same thing?

Join us at Merkle’s 2016 annual Performance Marketer Executive Summit – June 7 
– 9 in Memphis, TN
<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfdc>

Download the latest installment of our annual Marketing Imperatives, “Winning 
with People-Based 
Marketing”<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfd1>

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

Join us at Merkle’s 2016 annual Performance Marketer Executive Summit – June 7 
– 9 in Memphis, TN
<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfdc>

Download the latest installment of our annual Marketing Imperatives, “Winning 
with People-Based 
Marketing”<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfd1>

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

Reply via email to