Drill version: 1.5.0. I found that Drill will optimize an IN predicate with many values(over 20) into HashJoin, like that:
0: jdbc:drill:drillbit=localhost> explain plan for select campaign_id from campaign where campaign_id in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) limit 10; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(campaign_id=[$0]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[10]) 00-04 UnionExchange 01-01 SelectionVectorRemover 01-02 Limit(fetch=[10]) 01-03 Project(campaign_id=[$0]) 01-04 HashJoin(condition=[=($1, $2)], joinType=[inner]) 01-06 Project($f0=[$0], $f66=[$0]) 01-07 Scan(groupscan=[IndexRGroupScan [IndexRScanSpec={tableName='campaign', rsFilter=null}, columns=[`campaign_id`]]]) 01-05 BroadcastExchange 02-01 HashAgg(group=[{0}]) 02-02 Values And I’m implementing a new storage plugin which is especially designed to efficiently handled the long IN condition. So I really need to get the IN condition. Just like when the IN values count less than 20: 0: jdbc:drill:drillbit=localhost> explain plan for select campaign_id from campaign where campaign_id in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) limit 10; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(campaign_id=[$0]) 00-02 SelectionVectorRemover 00-03 Limit(fetch=[10]) 00-04 UnionExchange 01-01 SelectionVectorRemover 01-02 Limit(fetch=[10]) 01-03 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3), =($0, 4), =($0, 5), =($0, 6), =($0, 7), =($0, 8), =($0, 9), =($0, 10), =($0, 11), =($0, 12), =($0, 13), =($0, 14), =($0, 15), =($0, 16), =($0, 17), =($0, 18), =($0, 19))]) 01-04 Scan(groupscan=[IndexRGroupScan [IndexRScanSpec={tableName='campaign', rsFilter=In($0: (19)[1,2,3,4,5 ...])}, columns=[`campaign_id`]]]) Any suggest that I can fetch the whole filter condition, especially the IN condition?