sounds to me that what you want to achieve is similar to view expansion, but there is one big difference.
You want to internally rewrite : SELECT * FROM EVENTS ==> SELECT * FROM dfs.`/drill/table1.json` AS t1 JOIN dfs.`/drill/table2.json`AS t2 ON t1.id = t2.id This is similar to CREATE VIEW EVENTS as SELECT * FROM dfs.`/drill/table1.json` AS t1 JOIN dfs.`/drill/table2.json`AS t2 ON t1.id = t2.id; MyTable.toRel() is kind of doing the work of view expansion in [1] However, for view, the row type of view is cleanly defined, before view expansion. For the above "EVENT" view, it would have t1.* and t2.*. However, in your case, you do not know the row type of "EVENT" table, until after "EVENT" is expanded. As such, the top project has only *, in stead of expanding into t1.* and t2.*. It's quite challenging to make * column query work against dynamic table (btw. some of those planner logic is in CALCITE-1150 [2] ). In your case, it's even more challenging, since the dynamic table would be expanded into a list of tables. Why can't you use view / view expansion? If the list of tables is kind of dynamic, you may use some script to re-generate the view DDL. This way, you do not have to solve a bunch of unsolved problems. [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillViewTable.java#L60 [2] https://issues.apache.org/jira/browse/CALCITE-1150 On Thu, Jun 2, 2016 at 12:19 AM, Jesse Yates <jesse.k.ya...@gmail.com> wrote: > Its definitely getting closer, Thanks Jinfeng! > > I end up with this set of plans > <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-expanded-table-planning>. > However, the only column that is returned is * and it contains just the > joined id column, rather than the full column expansion. > > This is in contrast to the raw file join physical plan > <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-raw-sql-physical-plan> > which has an expanded projection condition. > > I'm not sure how I can force the top projection to select correctly. Do i > inject it as a wrapper scan around the created join or create a new rule > that takes (project, join) and replaces it with the correct project > condition? Naturally, I'd have to type that rule/join to not just be the > logical so it can convert only this generated base case. > > Thanks! > --Jesse > > ps. Updated the gist > <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-drill-to-rel-table-example-java> > to cover the offset changes for the degenerate 2 table case. > > On Wed, Jun 1, 2016 at 10:26 PM Jinfeng Ni <jinfengn...@gmail.com> wrote: > >> Seems to me that the reason you run into problem when build the join >> condition is that you are trying to compose the condition while adding >> the fields on-the-fly in [1]. This is different from what Drill is >> doing in Calcite library. For Drill, before construct the join >> condition (RexNode) t1.id = t2.id, SqlValidator will make sure those >> fields exists in the table. For dynamic table, such check will lead >> to one additional field in table's rowType. After validation, when >> Calcite builds join condition in SqlToRelConverter, it already knows >> the field list for each input table (at that time, table's rowType >> should be immutable), and hence it can adjust the reference to the >> right table field, by adding the # of LHS fields to the index. That >> is, =($1, $1) would become =($1, $3). >> >> I feel you may follow such logic, by calling field(table1, fieldName) >> on each side of join, before construct the join condition. >> >> >> [1] >> https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89#file-drill-to-rel-table-example-java-L94-L95 >> >> On Wed, Jun 1, 2016 at 9:50 PM, Jesse Yates <jesse.k.ya...@gmail.com> >> wrote: >> > I'm building a layer "on top"[1] that hides the details of accessing >> > underlying "partitions" (stored as individual tables) and picks the right >> > tables based on the query (time partitioned, so prune tables which won't >> > fulfill the query). >> > >> > More concretely, lets take a set of files, each one fulfilling the role >> of >> > "table": >> > /drill >> > /table1.json >> > /table2.json >> > or in drill parlance, *dfs.`/drill/table1.json`* and >> *dfs.`/drill/table2.json`. >> > T*he query that I want to make externally is:* SELECT * FROM EVENTS *and >> I >> > want to translate that internally to >> > >> > *SELECT * FROM dfs.`/drill/table1.json` AS t1 JOIN >> > dfs.`/drill/table2.json`AS t2 ON t1.id <http://t1.id> = t2.id >> > <http://t2.id>.*Executing that expanded (second) query from the >> top-level >> > java.sql.Connection executes fine, hence my thoughts about the sub-table >> > not actually being found. >> > >> > Here is what I am trying to do in my table handler - the 'EVENTS' table ( >> > gist <https://gist.github.com/jyates/f11eb44a44af715b483859f497b9ea89>). >> > >> > Does that help at all? >> > >> > Thanks, >> > Jesse >> > >> > [1] Where on top in this case includes components inside Drill. >> > >> > On Wed, Jun 1, 2016 at 9:27 PM Jinfeng Ni <jinfengn...@gmail.com> wrote: >> > >> >> I'm not sure if I understand your problem correctly. Are you trying to >> >> build some non-SQL interface on top of Drill, to join a set of dynamic >> >> tables? Can you give more concrete example? >> >> >> >> When Drill handles join over two dynamic tables, except for * column >> >> query, the dynamic tables have a list of fields defined, since those >> >> fields are referred in the query (even though the planner does not >> >> know each field's type). Therefore, the join condition will never be >> >> =($1,$1); it would be resolved to reference to left/right tables's >> >> fields. >> >> >> >> >> >> >> >> On Wed, Jun 1, 2016 at 7:43 PM, Jesse Yates <jya...@apache.org> wrote: >> >> > Hi all, >> >> > >> >> > I'm trying to rewrite a query of a table (ala Table#toRel) to join a >> set >> >> of >> >> > dynamic (sub-)tables on a couple of known columns but am getting stuck >> >> > building the condition. The sub-tables are not part of the original >> >> query, >> >> > but rather selected on-the-fly at logical query time. >> >> > >> >> > I can't use UNION-ALL because, outside of a couple of known columns, >> the >> >> > remainder are completely dynamic. >> >> > >> >> > Using RelBuilder I can construct the join via a series of scans and >> then >> >> > joins on the known fields[1]. However, this only creates RelInputRefs >> >> which >> >> > are not at all associated with the current ref numbering because the >> the >> >> > sub-tables are not present in the original query. Thus, we get >> conditions >> >> > like: >> >> > (=($1,$1), which looks appears TRUE but actually should reference the >> >> > left/right tables' fields. >> >> > >> >> > I tried playing around with RelRangeRef and manually managing the >> field >> >> > offsets in query (similar to BlackBoard), but that call gets >> translated >> >> > into an actually ALWAYS-TRUE condition and also fails the >> >> > JoinUtil#checkCartesianJoin case. >> >> > >> >> > If I construct the query via standard SQL at the top level (using >> known >> >> > tables), everything works fine, I think because of the ref-numbering >> to >> >> > which I cannot get access in #toRel(). >> >> > >> >> > Any thoughts on the right way to go about this? >> >> > >> >> > Thanks much, >> >> > Jesse Yates >> >> > >> >> > >> >> > [1] Actually, this meant digging into RelBuilder >> >> > < >> >> >> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L309 >> >> > >> >> > since dynamic tables require the field names to already be set and >> then >> >> > uses the column name from the list to get the field index >> >> >>