Yes, Drill-3759 covers it. This is a high priority enhancement that we are trying to get to in the next couple of releases.
-Neeraja On Tue, Feb 9, 2016 at 7:32 AM, John Omernik <j...@omernik.com> wrote: > This one seems to cover it: > > https://issues.apache.org/jira/browse/DRILL-3759 > > > > On Tue, Feb 9, 2016 at 9:25 AM, Abdel Hakim Deneche <adene...@maprtech.com > > > wrote: > > > Hi John, > > > > Sorry I didn't get back to you (I thought I did). > > > > No, I don't need the plan, I just wanted to confirm what was taking most > of > > the time and you already confirmed it's the planning. > > > > Can you open a JIRA for this ? this may be a known issue, but I'm not > sure. > > > > Thanks > > > > On Tue, Feb 9, 2016 at 6:08 AM, John Omernik <j...@omernik.com> wrote: > > > > > Abdel, do you still need the plans, as I said, if your table has any > > decent > > > amount of directories and files, it looks like the planning is touching > > all > > > the directories even though you are pruning. I can post plans, > however, > > I > > > think in this case you'll find they are exactly the same, and the only > > > difference is that the longer queries is planning much more because it > > has > > > more files to read. > > > > > > > > > On Thu, Feb 4, 2016 at 10:46 AM, John Omernik <j...@omernik.com> > wrote: > > > > > > > I can package up both plans for you if you need them (let me know if > > you > > > > still want them) but I can tell you the plans were EXACTLY the same, > > > > however the data-sum table took 0.932 seconds to plan the query, and > > the > > > > data table (the one with the all the extra data) took 11.379 seconds > to > > > > plan the query. Indicating to me the issue isn't in the plan that was > > > > created, but the actual planning process. (Let me know if you > disagree > > or > > > > still need to see the plan, like I said, the actual plans were > exactly > > > the > > > > same) > > > > > > > > > > > > John. > > > > > > > > > > > > On Thu, Feb 4, 2016 at 10:31 AM, Abdel Hakim Deneche < > > > > adene...@maprtech.com> wrote: > > > > > > > >> Hey John, can you try an explain plan for both queries and see how > > much > > > >> times it takes ? > > > >> > > > >> for example, for the first query you would run: > > > >> > > > >> *explain plan for* select count(1) from `data/2016-02-03`; > > > >> > > > >> It can also be helpful if you could share the query profiles for > both > > > >> queries. > > > >> > > > >> Thanks > > > >> > > > >> On Thu, Feb 4, 2016 at 8:15 AM, John Omernik <j...@omernik.com> > > wrote: > > > >> > > > >> > Hey all, I think am I seeing an issue related to > > > >> > https://issues.apache.org/jira/browse/DRILL-3759 but I want to > > > >> describe it > > > >> > out here, see if it's really the case, and then determine what the > > > >> blockers > > > >> > may be to resolution. > > > >> > > > > >> > I am using the MapR Developer Release 1.4, and I have a directory > > with > > > >> > subdirectories by data. > > > >> > > > > >> > data/2015-01-01 > > > >> > data/2015-01-02 > > > >> > data/2015-01-03 > > > >> > > > > >> > These are stored as Parquet files. At this point Each data > averages > > > >> about > > > >> > 1 GB of data, and has roughly 75 parquet files in it. > > > >> > > > > >> > When I run > > > >> > > > > >> > select count(1) from `data/2016-02-03` it takes roughly 11 > seconds. > > > >> > > > > >> > If I copy the 2016-02-03 directory to a new base (date-sum) and > run > > > >> > > > > >> > select count(1) from `data_sum/2016-02-03` it runs in 0.874 > seconds. > > > >> > > > > >> > Same data, same structure, only difference is the data_sum > directory > > > >> only > > > >> > has a few directories, iand data has dates going back to Nov 2015. > > It > > > >> > seems like it is getting files name for all files in each > directory > > > >> prior > > > >> > to pruning which seems to me to be adding a lot of latency to > > queries > > > >> that > > > >> > doesn't need to be there. (thus I think I am seeing 3759) but I > > > wanted > > > >> to > > > >> > confirm, and then I wanted to see how we can address this in that > > the > > > >> > directory prune should be fast, and on large data sets its just > > going > > > to > > > >> > get worse and worse. > > > >> > > > > >> > > > > >> > > > > >> > John > > > >> > > > > >> > > > >> > > > >> > > > >> -- > > > >> > > > >> Abdelhakim Deneche > > > >> > > > >> Software Engineer > > > >> > > > >> <http://www.mapr.com/> > > > >> > > > >> > > > >> Now Available - Free Hadoop On-Demand Training > > > >> < > > > >> > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > >