Hi Yijie,
This is Dechang at MapR. I work on Drill performance.

>From what you described, looks like scan took most of the time.
How are the files are distributed on the disks, are there any skew?
How many disks are there?
If possible can you provide the profile for the run?

Thanks,
Dechang

On Sun, May 22, 2016 at 9:06 AM, Yijie Shen <henry.yijies...@gmail.com>
wrote:

> Hi all,
>
> I'm trying out Drill on master branch lately and have deployed a cluster on
> three physical server.
>
> The input data `lineitem` is in parquet format of total size 150GB, 101MB
> per file and 1516 files in total.
>
> The server has two Intel(R) Xeon(R) CPU E5645 @2.40GHz CPUs and 24 cores in
> total, 32GB memory.
>
> While executing Q1 using:
>
>  SELECT
>   L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
> AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
> FROM
>   dfs.tpch.`lineitem`
> WHERE
>   L_SHIPDATE<='1998-09-02'
> GROUP BY L_RETURNFLAG, L_LINESTATUS
> ORDER BY L_RETURNFLAG, L_LINESTATUS
>
> I've noticed the parallelism was 51 (planner.width.max_per_node = 17) in my
> case for Major Fragment 03 (Scan Filter Project HashAgg and Project), and
> each Minor fragment last about 8 to 9 minutes. one record for example:
>
> 03-00-xx hw080 7.309s 42.358s 9m35s 118,758,489 14,540 22:31:32 22:31:32
> 33MB FINISHED
>
> Is this a normal speed (more than 10 minutes) for Drill for my current
> cluster? Did I miss something important in conf to accelerate the
> execution?
>
> Thanks very much!
>
> Yijie
>

Reply via email to