Drill version: 1.5.0.

I found that Drill will optimize an IN predicate with many values(over 20) into 
HashJoin, like that:

0: jdbc:drill:drillbit=localhost> explain plan for select campaign_id from 
campaign where campaign_id 
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) limit 10;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(campaign_id=[$0])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[10])
00-04            UnionExchange
01-01              SelectionVectorRemover
01-02                Limit(fetch=[10])
01-03                  Project(campaign_id=[$0])
01-04                    HashJoin(condition=[=($1, $2)], joinType=[inner])
01-06                      Project($f0=[$0], $f66=[$0])
01-07                        Scan(groupscan=[IndexRGroupScan 
[IndexRScanSpec={tableName='campaign', rsFilter=null}, 
columns=[`campaign_id`]]])
01-05                      BroadcastExchange
02-01                        HashAgg(group=[{0}])
02-02                          Values

And I’m implementing a new storage plugin which is especially designed to 
efficiently handled the long IN condition. So I really need to get the IN 
condition.  Just like when the IN values count less than 20:

0: jdbc:drill:drillbit=localhost> explain plan for select campaign_id from 
campaign where campaign_id in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) 
limit 10;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(campaign_id=[$0])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[10])
00-04            UnionExchange
01-01              SelectionVectorRemover
01-02                Limit(fetch=[10])
01-03                  Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3), =($0, 
4), =($0, 5), =($0, 6), =($0, 7), =($0, 8), =($0, 9), =($0, 10), =($0, 11), 
=($0, 12), =($0, 13), =($0, 14), =($0, 15), =($0, 16), =($0, 17), =($0, 18), 
=($0, 19))])
01-04                    Scan(groupscan=[IndexRGroupScan 
[IndexRScanSpec={tableName='campaign', rsFilter=In($0: (19)[1,2,3,4,5 ...])}, 
columns=[`campaign_id`]]])

Any suggest that I can fetch the whole filter condition, especially the  IN 
condition?



Reply via email to