On Thu, Dec 8, 2016 at 11:45 PM, Jinfeng Ni <j...@apache.org> wrote: > Can you please check the Explain plan output for the original query > and the query against view, and see if there is any difference in the > two query plans? The difference might be caused by UNION ALL operator, > which might lead to different parallelization mode.
Hi, Here is outputs. All data in one file. 0: jdbc:drill:zk=local> explain plan for select action['login'], count(*) from dfs.datastore.events_parquest group by action['login']; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 00-02 UnionExchange 01-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 01-02 HashAgg(group=[{0}], EXPR$1=[$SUM0($1)]) 01-03 Project(EXPR$0=[$0], EXPR$1=[$1]) 01-04 HashToRandomExchange(dist0=[[$0]]) 02-01 UnorderedMuxExchange 03-01 Project(EXPR$0=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) 03-02 HashAgg(group=[{0}], EXPR$1=[COUNT()]) 03-03 Project(EXPR$0=[ITEM($0, 'login')]) 03-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/mnt/data/events_parquest]], selectionRoot=file:/mnt/data/events_parquest, numFiles=1, usedMetadataFile=false, columns=[`action`.`login`]]]) Data split in 4 files and combined with UNION ALL 0: jdbc:drill:zk=local> explain plan for select action['login'], count(*) from dfs.datastore.parquet_synthetic_events_large_partition_all group by action['login']; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 00-02 UnionExchange 01-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 01-02 HashAgg(group=[{0}], EXPR$1=[$SUM0($1)]) 01-03 Project(EXPR$0=[$0], EXPR$1=[$1]) 01-04 HashToRandomExchange(dist0=[[$0]]) 02-01 UnorderedMuxExchange 03-01 Project(EXPR$0=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) 03-02 HashAgg(group=[{0}], EXPR$1=[COUNT()]) 03-03 Project(EXPR$0=[ITEM($2, 'login')]) 03-04 UnionAll(all=[true]) 03-06 Project(timestamp=[$0], client_id=[$1], action=[$2]) 03-08 UnionAll(all=[true]) 03-10 Project(timestamp=[$0], client_id=[$1], action=[$2]) 03-12 UnionAll(all=[true]) 03-14 Project(timestamp=[$0], client_id=[$1], action=[$2]) 03-16 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/mnt/data/parquet_synthetic_events_large_partition_0]], selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_0, numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`, `action`]]]) 03-13 Project(timestamp=[$0], client_id=[$1], action=[$2]) 03-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/mnt/data/parquet_synthetic_events_large_partition_1]], selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_1, numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`, `action`]]]) 03-09 Project(timestamp=[$0], client_id=[$1], action=[$2]) 03-11 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/mnt/data/parquet_synthetic_events_large_partition_2]], selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_2, numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`, `action`]]]) 03-05 Project(timestamp=[$0], client_id=[$1], action=[$2]) 03-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/mnt/data/parquet_synthetic_events_large_partition_3]], selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_3, numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`, `action`]]])