On Tue, May 24, 2016 at 7:07 PM, Yijie Shen
wrote:
> Hi Dechang,
>
> Thanks very much for your help!
>
> I get a little confused here, why does skew exist?
>
> After some statistic work, I got this: 1516 files and 102.54MB on average,
> max of 104MB, min of 95MB.
> On
Hi Yijie,
Thanks for the profile. Looks like from the Operator Profile overview, 03-xx-02
HASH_AGGREGATE and 03-xx-06 PARQUET_ROW_GROUP_SCAN took the most of time:
03-xx-02HASH_AGGREGATE 0.020s 0.083s 0.213s 1m06s 1m55s 3m12s
0.000s 0.000s 0.000s 16MB16MB
03-xx-03
Hi Yijie,
This is Dechang at MapR. I work on Drill performance.
>From what you described, looks like scan took most of the time.
How are the files are distributed on the disks, are there any skew?
How many disks are there?
If possible can you provide the profile for the run?
Thanks,
Dechang
On
Hi all,
I'm trying out Drill on master branch lately and have deployed a cluster on
three physical server.
The input data `lineitem` is in parquet format of total size 150GB, 101MB
per file and 1516 files in total.
The server has two Intel(R) Xeon(R) CPU E5645 @2.40GHz CPUs and 24 cores in