Drill join performance

2016-02-22 Thread Dmitry Krivov
Hello I have load (as CTAS) into parquet-files StarShema Benchmark generated csv-data (scale factor 50) For one of bencmark query's like : select d.d_year, c.c_region, sum(l.lo_extendedprice*l.lo_discount) as revenue from dfs.tpch.lineorder_part l, dfs.tpch.dates d, dfs.tpch.custom

Re: Drill join performance

2016-02-22 Thread Abdel Hakim Deneche
Hello Dmitry, Welcome to Drill's community :) What version of Drill are you using ? Also, can you share the query profile of your query, it helps to show what taking most of the time. Thanks On Mon, Feb 22, 2016 at 10:54 AM, Dmitry Krivov wrote: > Hello > > I have load (as CTAS) into parquet-

Re: Drill join performance

2016-03-18 Thread Abdel Hakim Deneche
One quick note here, I don't think partitioning LINEORDER table on LO_ORDERDATE would help this query. If you look at the query profile you will see that Drill is reading everything from LINEORDER. On Fri, Mar 18, 2016 at 7:57 AM, Dmitry Krivov wrote: > Just for info : > > After recreating table

RE: Drill join performance

2016-03-19 Thread Dmitry Krivov
Just for info : After recreating tables with explicit columns CASTing have double performace of this query (from 60 to 35 sec.) Best regards, Dmitry > Hello > > I have load (as CTAS) into parquet-files StarShema Benchmark generated > csv-data (scale factor 50) > > For one of bencmark query's l