Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-24 Thread Sudheesh Katkam
Hey y'all, ### Short Question: How do we improve performance of SELECT * FROM plugin.table LIMIT 0? ### Extended Question: While investigating DRILL-3623 , I did an analysis to see where we spend time for SELECT * FROM hive.table LIMIT 0 query

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-24 Thread Jinfeng Ni
"FragmentExecutor took 1,070,926 ms to create RecordBatch tree." 1,070,926 ms ~ 17.x minutes. In other words, the majority of 18 minutes of execution in hive case is spent on the initialization of Hive readers. If we want to improve "limit n", we probably should make "lazy" initialization of Hive

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-24 Thread Sudheesh Katkam
For the table below, 33 seconds for execution (includes parquet reader initialization) and 60 seconds for planning. > On Sep 24, 2015, at 10:01 PM, Jinfeng Ni wrote: > > "FragmentExecutor took 1,070,926 ms to create RecordBatch tree." > > 1,070,926 ms ~ 17.x minutes. In other words, the major

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-24 Thread Jinfeng Ni
The query itself is quite simple; it normally should not take 60 seconds for planning. I guess most of the planning time is spent on reading parquet metadata. The metadata caching that Steven worked should help in this case. On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam wrote: > For the tabl

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-25 Thread Jacques Nadeau
Limit zero shouldn't use any readers if we know the schema. Look at the upstream constant reduction rule. We should be able to go straight from calcite algebra to result without hitting any execution code. Think direct response same as explain. On Sep 24, 2015 10:46 PM, "Jinfeng Ni" wrote: > The

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-25 Thread Jacques Nadeau
Another thought: record batch tree creation time should be short. If any substantial work needs to be done, we should move it to setup. On Sep 25, 2015 6:47 AM, "Jacques Nadeau" wrote: > Limit zero shouldn't use any readers if we know the schema. Look at the > upstream constant reduction rule. We

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-25 Thread Venki Korukanti
One issue in moving RecordReader creation to setup is in chained impersonation support. Fragment thread can be running within query user doAs block, but the setup is in doAs block of the user (may not be the query user) whom we want to impersonate when reading the underlying data. May be we should

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-09-29 Thread Sudheesh Katkam
My initial work  on this brought down the execution time from 1203 seconds to ~20 seconds (most of this is planning time). As Jinfeng pointed out, the planning time can be reduced using the parquet metadata

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-10-02 Thread Sudheesh Katkam
Hey y’all, I see that DRILL-1617 disables producer-consumer because of correctness issues. Should we enable this visitor (as Venki suggested) and resolve the issues? Thank you, Sudheesh > On Sep 29, 2015, at 6:00 PM, Sudheesh Katkam wrote: >

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-10-03 Thread Jacques Nadeau
It doesn't seem like there is any reason to use a producer/consumer behavior to work around the doAs behavior. If we need to have a two stage setup (with two different contexts), let's just enhance the readers with this behavior. The producer/consumer was disabled because it didn't show a performa

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

2015-10-07 Thread Sudheesh Katkam
See below. > On Oct 3, 2015, at 5:24 PM, Jacques Nadeau wrote: > > It doesn't seem like there is any reason to use a producer/consumer > behavior to work around the doAs behavior. If we need to have a two stage > setup (with two different contexts), let's just enhance the readers with > this beh