[ https://issues.apache.org/jira/browse/PHOENIX-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maryann Xue reassigned PHOENIX-1997: ------------------------------------ Assignee: Maryann Xue > Join optimization: Apply one table's where condition to the others > ------------------------------------------------------------------ > > Key: PHOENIX-1997 > URL: https://issues.apache.org/jira/browse/PHOENIX-1997 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.3.1 > Reporter: Taeyun Kim > Assignee: Maryann Xue > Priority: Minor > Labels: calcite > > Joined tables are as follows: > {noformat} > create table table_a > ( > time_id integer not null, > depth tinyint not null, > id0 integer not null, > id1 integer not null, > id2 integer not null, > id3 integer not null, > id integer not null, > record_type smallint not null, > c varbinary > constraint pk primary key(time_id, depth, id0, id1, id2, id3, id, > record_type) > ) > salt_buckets=4, > compression='SNAPPY', > create table table_b > ( > depth tinyint not null, > id0 integer not null, > id1 integer not null, > id integer not null, > c varbinary, > p varbinary > constraint pk primary key(depth, id0, id1, id) > ) > salt_buckets=2, > compression='SNAPPY'; > create index table_b_index on table_b (id, depth, id0, id1) > compression='SNAPPY'; > {noformat} > The query is as follows: > {noformat} > select a.*, b.c > from table_a a inner join table_b b on (a.depth = b.depth and a.id0 = b.id0 > and a.id1 = b.id1 and a.id2 = b.id) > where a.time_id = 23796900 and a.depth = 1; > {noformat} > It is obvious that b.depth must also be 1 since it's on the join condition. > And since the depth column is the first primary key column of table_b, > table_b should be range scanned before join. > But the query explanation is as follows: > {noformat} > CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1] > CLIENT MERGE SORT > PARALLEL INNER-JOIN TABLE 0 > CLIENT PARALLEL 2-WAY FULL SCAN OVER TABLE_B > CLIENT MERGE SORT > {noformat} > But when (b.depth = 1) condition is explicitly added to the query, the > explanation is changed as the expected one: > {noformat} > CLIENT PARALLEL 4-WAY RANGE SCAN OVER TABLE_A [0,23796900,1] > CLIENT MERGE SORT > PARALLEL INNER-JOIN TABLE 0 > CLIENT PARALLEL 2-WAY RANGE SCAN OVER TABLE_B [0,1] > CLIENT MERGE SORT > {noformat} > It would be nice if the optimizer could find this condition dependency and > apply it to the query plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)