Claus Stadler created CALCITE-1887: -------------------------------------- Summary: Detect transitive join conditions via expressions Key: CALCITE-1887 URL: https://issues.apache.org/jira/browse/CALCITE-1887 Project: Calcite Issue Type: Improvement Components: core Affects Versions: 1.13.0 Reporter: Claus Stadler Assignee: Julian Hyde
Given table aliases ta, tb column names ca, cb, and an arbitrary (deterministic) expression expr then calcite should be capable to infer join conditions by transitivity: {noformat} ta.ca = expr AND tb.cb = expr -> ta.ca = tb.cb {noformat} The use case for us stems from SPARQL to SQL rewriting, where SPARQL queries such as {code:java} SELECT { dbr:Leipzig a ?type . dbr:Leipzig dbo:mayor ?mayor } {code} result in an SQL query similar to {noformat} SELECT s.rdf a, s.rdf b WHERE a.s = 'dbr:Leipzig' AND b.s = 'dbr:Leipzig' {noformat} A consequence of the join condition not being recognized is, that Apache Flink does not find an executable plan to process the query. Self contained example: {code:java} package my.package; import org.apache.calcite.adapter.java.ReflectiveSchema; import org.apache.calcite.plan.RelOptUtil; import org.apache.calcite.rel.RelNode; import org.apache.calcite.rel.RelRoot; import org.apache.calcite.schema.SchemaPlus; import org.apache.calcite.sql.SqlNode; import org.apache.calcite.sql.parser.SqlParser; import org.apache.calcite.tools.FrameworkConfig; import org.apache.calcite.tools.Frameworks; import org.apache.calcite.tools.Planner; import org.junit.Test; public class TestCalciteJoin { public static class Triple { public String s; public String p; public String o; public Triple(String s, String p, String o) { super(); this.s = s; this.p = p; this.o = o; } } public static class TestSchema { public final Triple[] rdf = {new Triple("s", "p", "o")}; } @Test public void testCalciteJoin() throws Exception { SchemaPlus rootSchema = Frameworks.createRootSchema(true); rootSchema.add("s", new ReflectiveSchema(new TestSchema())); Frameworks.ConfigBuilder configBuilder = Frameworks.newConfigBuilder(); configBuilder.defaultSchema(rootSchema); FrameworkConfig frameworkConfig = configBuilder.build(); SqlParser.ConfigBuilder parserConfig = SqlParser.configBuilder(frameworkConfig.getParserConfig()); parserConfig .setCaseSensitive(false) .setConfig(parserConfig.build()); Planner planner = Frameworks.getPlanner(frameworkConfig); // SELECT s.rdf a, s.rdf b WHERE a.s = 5 AND b.s = 5 SqlNode sqlNode = planner.parse("SELECT * FROM \"s\".\"rdf\" \"a\", \"s\".\"rdf\" \"b\" WHERE \"a\".\"s\" = 5 AND \"b\".\"s\" = 5"); planner.validate(sqlNode); RelRoot relRoot = planner.rel(sqlNode); RelNode relNode = relRoot.project(); System.out.println(RelOptUtil.toString(relNode)); } } {code} Actual plan: {code:java} LogicalProject(s=[$0], p=[$1], o=[$2], s0=[$3], p0=[$4], o0=[$5]) LogicalFilter(condition=[AND(=($0, 5), =($3, 5))]) LogicalJoin(condition=[true], joinType=[inner]) EnumerableTableScan(table=[[s, rdf]]) EnumerableTableScan(table=[[s, rdf]]) {code} Expected Plan fragment: {code:java} LogicalJoin(condition=[=($0, $3)], joinType=[inner]) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)