Jesus Camacho Rodriguez created CALCITE-1682:
------------------------------------------------

             Summary: New metadata providers for expression column origin and 
all predicates in plan
                 Key: CALCITE-1682
                 URL: https://issues.apache.org/jira/browse/CALCITE-1682
             Project: Calcite
          Issue Type: New Feature
          Components: core
    Affects Versions: 1.12.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


I am working on the integration of materialized view rewriting within Hive.

Once a view matches an operator plan, rewriting is split vastly in two steps. 
The first step will verify that the input to the root operator of the matched 
plan is equivalent or contained within the input to the root operator of the 
query representing the view. The second step will trigger a _unify_ rule, which 
tries to rewrite the matched operator tree into a scan on the view and possibly 
some additional operators to compute the exact results needed by the query 
(think about Project that alters the column order, additional Filter on the 
view, additional Join operation, etc.)

If we focus on step 1, checking equivalence/containment, I would like to extend 
the metadata providers in Calcite to give us more information about the matched 
(sub)plan. In particular, I am thinking on:
- Expression column origin. Currently Calcite can provide the column origins 
for a certain column and whether it is derived or not. However, we would need 
to obtain the expression that generated a certain column. This expression 
should contain references to the input tables. For instance, given expression 
column _c_, the new md provider would return that it was generated by 
expression _A.a + B.b_. 
- All predicates. Currently Calcite can extract predicates that have been 
applied on an RelNode output (we can think on them as constraints on the 
output). However, I would like to extract all predicates that have been applied 
on a given RelNode (sub)plan. Since nodes might not be part of the output, 
expressions should contain references to the input tables. For instance, the 
new md provider might return the expressions _A.a + B.b > C.c AND D.d = 100_.
- PK-FK relationship. I do not plan to implement this one immediately. However, 
exposing this information (given it is provided) can help us to trigger more 
rewriting containing join operators. Thus, I was wondering if it is worth 
adding it.

Once this information is available, we can rely on it to implement logic 
similar to [1] to check whether a given (sub)plan is equivalent/contained 
within a given view.

One question I have is about representing the table columns as a RexNode, as I 
think it is the easiest way to be returned by the new metadata providers. I 
checked _RexPatternFieldRef_ and I think it will meet our requirements: alpha 
would be the qualified table name, while the index is the column idx for the 
table. Thoughts?

I have started working on this and will provide a patch shortly; feedback is 
greatly appreciated.

[1] 
ftp://ftp10.us.freebsd.org/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to