Hi Team, Below is the minimum reproduce of wrong results in Hive 0.13:
*1. Create 4 tables* CREATE EXTERNAL TABLE testjoin1( joincol string ); CREATE EXTERNAL TABLE testjoin2( anothercol string , joincol string); CREATE EXTERNAL TABLE testjoin3( anothercol string); CREATE EXTERNAL TABLE testjoin4( joincol string, wherecol string , wherecol2 string); *2. Insert sample data * (Note: Make sure you firstly create the dual table which only contains 1 row) insert into table testjoin1 select '1' from dual; insert into table testjoin2 select 'another','1' from dual; insert into table testjoin3 select 'another' from dual; insert into table testjoin4 select '1','I_AM_MISSING','201501' from dual; insert into table testjoin4 select '1','I_Shouldnot_be_in_output','201501' from dual; hive> select * from testjoin1; OK 1 Time taken: 0.04 seconds, Fetched: 1 row(s) hive> select * from testjoin2; OK another 1 Time taken: 0.039 seconds, Fetched: 1 row(s) hive> select * from testjoin3; OK another Time taken: 0.038 seconds, Fetched: 1 row(s) hive> select * from testjoin4; OK 1 I_AM_MISSING 201501 1 I_Shouldnot_be_in_output 201501 Time taken: 0.04 seconds, Fetched: 2 row(s) *3. SQL1 is returning wrong results.* Select testjoin4.* From testjoin1 JOIN testjoin2 ON (testjoin2.joincol = testjoin1.joincol) JOIN testjoin3 ON (testjoin3.anothercol= testjoin2.anothercol) JOIN testjoin4 ON (testjoin4.joincol = testjoin1.joincol AND testjoin4.wherecol2='201501') WHERE (testjoin4.wherecol='I_AM_MISSING'); 1 I_AM_MISSING 201501 1 I_Shouldnot_be_in_output 201501 Time taken: 21.702 seconds, Fetched: 2 row(s) *4. SQL2 is returning good result(If we move the both filters to WHERE clause )* Select testjoin4.* From testjoin1 JOIN testjoin2 ON (testjoin2.joincol = testjoin1.joincol) JOIN testjoin3 ON (testjoin3.anothercol= testjoin2.anothercol) JOIN testjoin4 ON (testjoin4.joincol = testjoin1.joincol) WHERE (testjoin4.wherecol='I_AM_MISSING' and testjoin4.wherecol2='201501'); 1 I_AM_MISSING 201501 Time taken: 20.393 seconds, Fetched: 1 row(s) ————— *Another test is done in Hive 1.0 and found both SQL1 and SQL2 are returning wrong results….* 1 I_AM_MISSING 201501 1 I_AM_MISSING 201501 Time taken: 13.983 seconds, Fetched: 2 row(s) *Anybody knows any related JIRAs?* -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)