Hi all, I'm trying out Drill on master branch lately and have deployed a cluster on three physical server.
The input data `lineitem` is in parquet format of total size 150GB, 101MB per file and 1516 files in total. The server has two Intel(R) Xeon(R) CPU E5645 @2.40GHz CPUs and 24 cores in total, 32GB memory. While executing Q1 using: SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE), SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)), SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY), AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1) FROM dfs.tpch.`lineitem` WHERE L_SHIPDATE<='1998-09-02' GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS I've noticed the parallelism was 51 (planner.width.max_per_node = 17) in my case for Major Fragment 03 (Scan Filter Project HashAgg and Project), and each Minor fragment last about 8 to 9 minutes. one record for example: 03-00-xx hw080 7.309s 42.358s 9m35s 118,758,489 14,540 22:31:32 22:31:32 33MB FINISHED Is this a normal speed (more than 10 minutes) for Drill for my current cluster? Did I miss something important in conf to accelerate the execution? Thanks very much! Yijie