Re: Query Planning and Directory Pruning

2016-02-09 Thread Jinfeng Ni
Hi John, I think the patch for DRILL-2517 [1] would help a bit, if you use parquet files. DRILL-2517 would save the overhead of reading parquet metadata from parquet footer, by pruning the directory first. (It list some preliminary performance results, similar to the setup you had ) However, the

Re: Query Planning and Directory Pruning

2016-02-09 Thread John Omernik
I am sorry that wasn't clearer, I am not sure why I used those examples, but if have a table table table/2016-02-04 table/2016-02-03 ... table/2015-07-01 (so a subdirectory for each day for half a year) then I make a new empty directory of table1 table1/2016-02-04 table1/2016-02-03 (only two

Re: Query Planning and Directory Pruning

2016-02-09 Thread Aman Sinha
At a glance, John's query does not have a WHERE clause..it is querying the subdirectory directly in the FROM clause..in this case Drill will only look at the files within that subdirectory. Directory pruning only comes into the picture when there is a WHERE condition on dir0, dir1 etc. On Tue, F

Re: Query Planning and Directory Pruning

2016-02-09 Thread Neeraja Rentachintala
Yes, Drill-3759 covers it. This is a high priority enhancement that we are trying to get to in the next couple of releases. -Neeraja On Tue, Feb 9, 2016 at 7:32 AM, John Omernik wrote: > This one seems to cover it: > > https://issues.apache.org/jira/browse/DRILL-3759 > > > > On Tue, Feb 9, 2016

Re: Query Planning and Directory Pruning

2016-02-09 Thread John Omernik
This one seems to cover it: https://issues.apache.org/jira/browse/DRILL-3759 On Tue, Feb 9, 2016 at 9:25 AM, Abdel Hakim Deneche wrote: > Hi John, > > Sorry I didn't get back to you (I thought I did). > > No, I don't need the plan, I just wanted to confirm what was taking most of > the time a

Re: Query Planning and Directory Pruning

2016-02-09 Thread Abdel Hakim Deneche
Hi John, Sorry I didn't get back to you (I thought I did). No, I don't need the plan, I just wanted to confirm what was taking most of the time and you already confirmed it's the planning. Can you open a JIRA for this ? this may be a known issue, but I'm not sure. Thanks On Tue, Feb 9, 2016 at

Re: Query Planning and Directory Pruning

2016-02-09 Thread John Omernik
Abdel, do you still need the plans, as I said, if your table has any decent amount of directories and files, it looks like the planning is touching all the directories even though you are pruning. I can post plans, however, I think in this case you'll find they are exactly the same, and the only d

Re: Query Planning and Directory Pruning

2016-02-04 Thread John Omernik
I can package up both plans for you if you need them (let me know if you still want them) but I can tell you the plans were EXACTLY the same, however the data-sum table took 0.932 seconds to plan the query, and the data table (the one with the all the extra data) took 11.379 seconds to plan the que

Re: Query Planning and Directory Pruning

2016-02-04 Thread Abdel Hakim Deneche
Hey John, can you try an explain plan for both queries and see how much times it takes ? for example, for the first query you would run: *explain plan for* select count(1) from `data/2016-02-03`; It can also be helpful if you could share the query profiles for both queries. Thanks On Thu, Feb

Query Planning and Directory Pruning

2016-02-04 Thread John Omernik
Hey all, I think am I seeing an issue related to https://issues.apache.org/jira/browse/DRILL-3759 but I want to describe it out here, see if it's really the case, and then determine what the blockers may be to resolution. I am using the MapR Developer Release 1.4, and I have a directory with subdi