Quanlong Huang created IMPALA-13193: ---------------------------------------
Summary: RuntimeFilter on parquet dictionary should evaluate null values Key: IMPALA-13193 URL: https://issues.apache.org/jira/browse/IMPALA-13193 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime filter on parquet dictionary values. If non of the values can pass the check, the whole row group will be skipped. However, NULL values are not included in the parquet dictionary. Runtime filters that accept NULL values might incorrectly reject the row group if none of the dictionary values can pass the check. Here are steps to reproduce the bug: {code:sql} create table parq_tbl (id bigint, name string) stored as parquet; insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc"); create table dim_tbl (name string); insert into dim_tbl values (NULL); select * from parq_tbl p join dim_tbl d on COALESCE(p.name, '') = COALESCE(d.name, '');{code} The SELECT query should return 2 rows but now it returns 0 rows. A workaround is to disable this optimization: {code:sql} set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)