The issue seems to be because of the incorrect query plan. In the plan we
can see:
predicate:(a1 is not null and b1 is not null, see queries below). This
predicate is created with HiveJoinAddNotNull, which is adding predicate
that is checking that all fields that take part in filter operator are not
null. I'm curious what is the purpose of this rule, after creating such
predicate we can't use functions that can process NULLs and give valid
result as filters. See steps to reproduce. I would really appreciate any
help in order to understand this case.

STEPS TO REPRODUCE:

Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;

ACTUAL RESULT:

The query returns nothing;

EXPECTED RESULT:

1       NULL    1

The issue seems to be because of the incorrect query plan. In the plan we
can see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows
is any column mentioned in the COALESCE has null value.
Please find the query plan below:

Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Map 1
      File Output Operator [FS_10]
        Map Join Operator [MAPJOIN_15] (rows=1 width=4)
          
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
        <-Map 2 [BROADCAST_EDGE]
          BROADCAST [RS_7]
            PartitionCols:_col0
            Select Operator [SEL_5] (rows=1 width=1)
              Output:["_col0"]
              Filter Operator [FIL_14] (rows=1 width=1)
                predicate:a2 is not null
                TableScan [TS_3] (rows=1 width=1)
                  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
        <-Select Operator [SEL_2] (rows=1 width=4)
            Output:["_col0","_col1"]
            Filter Operator [FIL_13] (rows=1 width=4)
              predicate:{color:red}(a1 is not null and b1 is not null){color}
              TableScan [TS_0] (rows=1 width=4)
                default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]

This happens only if join is inner type, otherwise HiveJoinAddNotRule which
creates this problem is skipped.

Reply via email to