The issue seems to be because of the incorrect query plan. In the plan we can see: predicate:(a1 is not null and b1 is not null, see queries below). This predicate is created with HiveJoinAddNotNull, which is adding predicate that is checking that all fields that take part in filter operator are not null. I'm curious what is the purpose of this rule, after creating such predicate we can't use functions that can process NULLs and give valid result as filters. See steps to reproduce. I would really appreciate any help in order to understand this case.
STEPS TO REPRODUCE: Step 1: Create a table ct1 create table ct1 (a1 string,b1 string); Step 2: Create a table ct2 create table ct2 (a2 string); Step 3 : Insert following data into table ct1 insert into table ct1 (a1) values ('1'); Step 4 : Insert following data into table ct2 insert into table ct2 (a2) values ('1'); Step 5 : Execute the following query select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2; ACTUAL RESULT: The query returns nothing; EXPECTED RESULT: 1 NULL 1 The issue seems to be because of the incorrect query plan. In the plan we can see: predicate:(a1 is not null and b1 is not null) which does not look correct. As a result, it is filtering out all the rows is any column mentioned in the COALESCE has null value. Please find the query plan below: Plan optimized by CBO. Vertex dependency in root stage Map 1 <- Map 2 (BROADCAST_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Map 1 File Output Operator [FS_10] Map Join Operator [MAPJOIN_15] (rows=1 width=4) Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"] <-Map 2 [BROADCAST_EDGE] BROADCAST [RS_7] PartitionCols:_col0 Select Operator [SEL_5] (rows=1 width=1) Output:["_col0"] Filter Operator [FIL_14] (rows=1 width=1) predicate:a2 is not null TableScan [TS_3] (rows=1 width=1) default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"] <-Select Operator [SEL_2] (rows=1 width=4) Output:["_col0","_col1"] Filter Operator [FIL_13] (rows=1 width=4) predicate:{color:red}(a1 is not null and b1 is not null){color} TableScan [TS_0] (rows=1 width=4) default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"] This happens only if join is inner type, otherwise HiveJoinAddNotRule which creates this problem is skipped.