Hi Team,

Below is the minimum reproduce of wrong results in Hive 0.13:

*1. Create 4 tables*
CREATE EXTERNAL TABLE testjoin1( joincol string );
CREATE EXTERNAL TABLE testjoin2(
   anothercol string ,
   joincol string);

CREATE EXTERNAL TABLE testjoin3( anothercol string);

CREATE EXTERNAL TABLE testjoin4(
  joincol string,
  wherecol string ,
  wherecol2 string);

*2. Insert sample data *
(Note: Make sure you firstly create the dual table which only contains 1
row)

insert into table testjoin1 select '1' from dual;
insert into table testjoin2 select 'another','1' from dual;
insert into table testjoin3 select 'another' from dual;
insert into table testjoin4 select '1','I_AM_MISSING','201501' from dual;
insert into table testjoin4 select '1','I_Shouldnot_be_in_output','201501'
from
dual;

hive> select * from testjoin1;
OK
1
Time taken: 0.04 seconds, Fetched: 1 row(s)

hive> select * from testjoin2;
OK
another    1
Time taken: 0.039 seconds, Fetched: 1 row(s)

hive> select * from testjoin3;
OK
another
Time taken: 0.038 seconds, Fetched: 1 row(s)

hive> select * from testjoin4;
OK
1    I_AM_MISSING    201501
1    I_Shouldnot_be_in_output    201501
Time taken: 0.04 seconds, Fetched: 2 row(s)

*3. SQL1 is returning wrong results.*

Select testjoin4.* From
testjoin1
JOIN testjoin2
  ON (testjoin2.joincol = testjoin1.joincol)
JOIN testjoin3
  ON (testjoin3.anothercol= testjoin2.anothercol)
JOIN testjoin4
  ON (testjoin4.joincol = testjoin1.joincol AND
testjoin4.wherecol2='201501')
WHERE (testjoin4.wherecol='I_AM_MISSING');

1    I_AM_MISSING    201501
1    I_Shouldnot_be_in_output    201501
Time taken: 21.702 seconds, Fetched: 2 row(s)


*4. SQL2 is returning good result(If we move the both filters to WHERE
clause )*

Select testjoin4.* From
testjoin1
JOIN testjoin2
  ON (testjoin2.joincol = testjoin1.joincol)
JOIN testjoin3
  ON (testjoin3.anothercol= testjoin2.anothercol)
JOIN testjoin4
  ON (testjoin4.joincol = testjoin1.joincol)
WHERE (testjoin4.wherecol='I_AM_MISSING' and testjoin4.wherecol2='201501');

1    I_AM_MISSING    201501
Time taken: 20.393 seconds, Fetched: 1 row(s)
—————
*Another test is done in Hive 1.0 and found both SQL1 and SQL2 are
returning wrong results….*

1 I_AM_MISSING 201501
1 I_AM_MISSING 201501
Time taken: 13.983 seconds, Fetched: 2 row(s)

*Anybody knows any related JIRAs?*

-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Reply via email to