Key-Based Scans and Joins

Kim, Min-Seok Mon, 17 Jul 2023 11:11:08 -0700

Hello All,

I'm creating a query system using Apache Calcite. My storage method can
only look up and scan by key, and doesn't support full scans without a key.
I've made the storage table using FilterableTable, which requires a
specific key in the filter.


I can do single queries like

*SELECT * FROM emps as e WHERE e.empno = 100*

where '*emps*' is stored with the key 'empno'. But, joins are more
difficult. For example, in

*SELECT * FROM emps as e JOIN depts as d ON e.deptno = d.deptno WHERE
e.empno = 100*

I can use 'e.empno = 100' to scan '*emps*'. Ideally, I'd use the result of
that scan when scanning '*depts*'

But, the current plan is:
EnumerableHashJoin
    ...
    EnumerableInterpreter
        BindableTableScan(table=[[SALES, EMPS]], filters=[[=($1, 100)]]) <-
*FilterableTable* is Used, it is feasible.
    EnumerableCalc
        EnumerableTableScan(table=[[SALES, EMPS]]) <- FilterableTable is
used but the filters in scan are empty so it is infeasible.

Instead, I want to do:

*EnumerableCustomHashJoin* <- custom HashJoin
    EnumerableCalc
        *EnumerableCustomTableScan*(table=[[SALES, EMPS]])  <- custom Scan
               EnumerableInterpreter
                    BindableTableScan(table=[[SALES, EMPS]],
filters=[[=($1, 100)]])

I need advice on how to make this happen. Any advice or insights you could
provide would be greatly appreciated. Looking forward to hearing from you.

Thanks
Minseok

Key-Based Scans and Joins

Reply via email to