Hi All, TLDR; Drill already provides a number of powerful features that give us 80-90% of what we need for DICT type. Much time could be saved by using them, focusing efforts on adding the remaining bits specific to DICT.
We divide the DICT problem down into two categories: 1. Internal representation, the topic of the previous note which suggested that a DICT is really just a repeated MAP. 2. DICT semantics, which is the topic here. Item 2, semantics, can itself be further divided into two groups: 3. Functionality already in Drill that can be extended/repurposed for the DICT type, if DICT is implemented as a repeated MAP. 4. New functionality which must be added. Existing functionality includes things like: * The flatten() function which, essentially, joins a DICT with its containing row. * The powerful nested table functionality (added by Parth, Aman and others over the last year) that lets users treat a map array (hence a DICT) as a nested table and allows sorting, filtering, aggregation and many other SQL operations. For item 4, Igor probably has a list of new functionality. Some might include: * A DICT data type which is a repeated map with the addition of identifying the key column. (Add a column property in ColumnMetadata, a field in MaterializedField.) * Using the implied uniqueness constraint on the key column to plan nested table operations (some operations might be simpler if we know the key is unique within each map array.) * Providing DICT functions such as extracting a value by key (noting that this can be done via a SELECT on the nested table.) * And so on. Leveraging functionality Drill already has should reduce the cost of implementation, and should avoid the compatibility issues that started this discussion. Thanks, - Paul