Re: Query Planning and Directory Pruning

Neeraja Rentachintala Tue, 09 Feb 2016 08:26:59 -0800

Yes, Drill-3759 covers it.
This is a high priority enhancement that we are trying to get to in the
next couple of releases.


-Neeraja

On Tue, Feb 9, 2016 at 7:32 AM, John Omernik <j...@omernik.com> wrote:

> This one seems to cover it:
>
> https://issues.apache.org/jira/browse/DRILL-3759
>
>
>
> On Tue, Feb 9, 2016 at 9:25 AM, Abdel Hakim Deneche <adene...@maprtech.com
> >
> wrote:
>
> > Hi John,
> >
> > Sorry I didn't get back to you (I thought I did).
> >
> > No, I don't need the plan, I just wanted to confirm what was taking most
> of
> > the time and you already confirmed it's the planning.
> >
> > Can you open a JIRA for this ? this may be a known issue, but I'm not
> sure.
> >
> > Thanks
> >
> > On Tue, Feb 9, 2016 at 6:08 AM, John Omernik <j...@omernik.com> wrote:
> >
> > > Abdel, do you still need the plans, as I said, if your table has any
> > decent
> > > amount of directories and files, it looks like the planning is touching
> > all
> > > the directories even though you are pruning.  I can post plans,
> however,
> > I
> > > think in this case you'll find they are exactly the same, and the only
> > > difference is that the longer queries is planning much more because it
> > has
> > > more files to read.
> > >
> > >
> > > On Thu, Feb 4, 2016 at 10:46 AM, John Omernik <j...@omernik.com>
> wrote:
> > >
> > > > I can package up both plans for you if you need them (let me know if
> > you
> > > > still want them) but I can tell you the plans were EXACTLY the same,
> > > > however the data-sum table took 0.932 seconds to plan the query, and
> > the
> > > > data table (the one with the all the extra data) took 11.379 seconds
> to
> > > > plan the query. Indicating to me the issue isn't in the plan that was
> > > > created, but the actual planning process. (Let me know if you
> disagree
> > or
> > > > still need to see the plan, like I said, the actual plans were
> exactly
> > > the
> > > > same)
> > > >
> > > >
> > > > John.
> > > >
> > > >
> > > > On Thu, Feb 4, 2016 at 10:31 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com> wrote:
> > > >
> > > >> Hey John, can you try an explain plan for both queries and see how
> > much
> > > >> times it takes ?
> > > >>
> > > >> for example, for the first query you would run:
> > > >>
> > > >> *explain plan for* select count(1) from `data/2016-02-03`;
> > > >>
> > > >> It can also be helpful if you could share the query profiles for
> both
> > > >> queries.
> > > >>
> > > >> Thanks
> > > >>
> > > >> On Thu, Feb 4, 2016 at 8:15 AM, John Omernik <j...@omernik.com>
> > wrote:
> > > >>
> > > >> > Hey all, I think am I seeing an issue related to
> > > >> > https://issues.apache.org/jira/browse/DRILL-3759 but I want to
> > > >> describe it
> > > >> > out here, see if it's really the case, and then determine what the
> > > >> blockers
> > > >> > may be to resolution.
> > > >> >
> > > >> > I am using the MapR Developer Release 1.4, and I have a directory
> > with
> > > >> > subdirectories by data.
> > > >> >
> > > >> > data/2015-01-01
> > > >> > data/2015-01-02
> > > >> > data/2015-01-03
> > > >> >
> > > >> > These are stored as Parquet files.  At this point Each data
> averages
> > > >> about
> > > >> > 1 GB of data, and has roughly 75 parquet files in it.
> > > >> >
> > > >> > When I run
> > > >> >
> > > >> > select count(1) from `data/2016-02-03` it takes roughly 11
> seconds.
> > > >> >
> > > >> > If I copy the 2016-02-03 directory to a new base (date-sum) and
> run
> > > >> >
> > > >> > select count(1) from `data_sum/2016-02-03` it runs in 0.874
> seconds.
> > > >> >
> > > >> > Same data, same structure, only difference is the data_sum
> directory
> > > >> only
> > > >> > has a few directories, iand data has dates going back to Nov 2015.
> > It
> > > >> > seems like it is getting files name for all files in each
> directory
> > > >> prior
> > > >> > to pruning which seems to me to be adding a lot of latency to
> > queries
> > > >> that
> > > >> > doesn't need to be there.  (thus I think I am seeing 3759) but I
> > > wanted
> > > >> to
> > > >> > confirm, and then I wanted to see how we can address this in that
> > the
> > > >> > directory prune should be fast, and on large data sets its just
> > going
> > > to
> > > >> > get worse and worse.
> > > >> >
> > > >> >
> > > >> >
> > > >> > John
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Abdelhakim Deneche
> > > >>
> > > >> Software Engineer
> > > >>
> > > >>   <http://www.mapr.com/>
> > > >>
> > > >>
> > > >> Now Available - Free Hadoop On-Demand Training
> > > >> <
> > > >>
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Re: Query Planning and Directory Pruning

Reply via email to