Re: what's the differenct between drill and optiq

Ted Dunning Wed, 27 May 2015 22:13:41 -0700

Andrew,

Sorry for being cryptic.  Hanifi is more clear.  My point was directed at
the differences between where Hive may ultimately go and where Drill is
now.  Hanifi was providing a good summary of where Drill is now.

As he said, Calcite does query parsing and planning.  Ultimately, it will
do the same for Hive.  Even so, Drill has extended Calcite's planning
capabilities in ways which are not used by Hive.  These extensions allow
Calcite to produce plans for the Drill execution engine.  That execution
engine is what Hanifi meant by flexible distributed columnar execution with
late binding.

SQL is not normally a late binding language.  Instead, it shows its long
heritage by being a very statically typed language.  That static typing is
a problem in the modern world of flexible data and dealing with this
problem is a key goal of Drill.

The key technological advance in Drill that enables it to address late
typing problems is something called the ANY type.  This is essentially a
way for the parser to punt the problem of resolving the type of some value
until the query is actually running.  At that point, Drill has an empirical
schema available for each record batch which can be used to do final code
generation and optimization.  If the empirical schema changes due to
changes in the data being processed, that code can be regenerated as
needed.

This is a huge philosophical and design change that is hard to just paste
onto an existing engine.  Just as it would be next to impossible to modify
a Pascal or Fortran execution environment to do the type inferencing and
lazy execution that Scala or Haskell do, it is going to be hard to extend
Hive's entire execution environment to deal with type dynamism.  Simply
passing around dynamic types will not give performance anywhere near what
Drill does because of the inevitable cost of type tag dispatching.

To give just the simplest example, suppose you have data that used a column
named X to hold an integer for a long while and then switched to using a
column named Y to hold a floating point number.  To deal with this, you
might create a view which has a case statement that uses the value of X or
Y, whichever is non-null.  In conventional SQL engines, the query parser
and planner would generate code for this case statement and it would
execute for every record.  With Drill, almost all record batches would have
*either* X or Y.  Drill would generate different code for those two
different patterns of data and that code would be generated with the
knowledge that X is null, or that Y is null.  As such, the optimizer in the
code generator would actually just completely remove the case statement by
evaluating it at code generation time.  By pushing that code generation
time very late in the execution, Drill would have no perceptible penalty
relative to uniformly typed code, but it would have the ability to deal
with non-uniform data.

My original comment was an indefensible shorthand for all of this.  Things
should be made as simple as possible, but no simpler, as the great man said.

On Wed, May 27, 2015 at 8:32 PM, Andrew Brust <
andrew.br...@bluebadgeinsights.com> wrote:

> That makes sense.  Just having trouble mapping that back on Ted's
> comment.  But I tend to think that's me and my ignorance.
>
> -----Original Message-----
> From: Hanifi Gunes [mailto:hgu...@maprtech.com]
> Sent: Wednesday, May 27, 2015 4:48 PM
> To: user
> Subject: Re: what's the differenct between drill and optiq
>
> Calcite does parsing & planning of queries. Drill executes in a very
> flexible distributed columnar fashion with late binding.
>
> On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> > Andrew,
> >
> > What Hive does not have is the extensions that Drill has that allow
> > SQL to be type flexible.  The ALL type and all of the implications
> > both in terms of implementation and user impact it has are a really big
> deal.
> >
> >
> >
> > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust <
> > andrew.br...@bluebadgeinsights.com> wrote:
> >
> > > Thanks!
> > >
> > > Sent from my phone
> > > <insert witty apology for typos here>
> > >
> > > ----- Reply message -----
> > > From: "PHANI KUMAR YADAVILLI" <phanikumaryadavi...@gmail.com>
> > > To: "user@drill.apache.org" <user@drill.apache.org>
> > > Subject: what's the differenct between drill and optiq
> > > Date: Wed, May 27, 2015 8:33 AM
> > >
> > > Yes hive uses calcite. You can refer hive documentation.
> > > On May 27, 2015 6:01 PM, "Andrew Brust" <
> > > andrew.br...@bluebadgeinsights.com>
> > > wrote:
> > >
> > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > Can anyone here confirm or deny that?
> > > >
> > > > -----Original Message-----
> > > > From: Rajkumar Singh [mailto:rsi...@maprtech.com]
> > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > To: user@drill.apache.org
> > > > Subject: Re: what's the differenct between drill and optiq
> > > >
> > > > Optiq(now known as calcite) is an api for query parser,planner and
> > > > optimization, drill uses it for the SQL parsing,validation and
> > > > optimization.Drill query planner applies its own custom planner
> > > > rules
> > to
> > > > build the query logical plan.
> > > >
> > > > Rajkumar Singh
> > > >
> > > >
> > > >
> > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <chenlij...@togeek.cn> wrote:
> > > > >
> > > > > Hi:
> > > > >
> > > > > I just want to know the difference between drill and optiq.
> > > > >
> > > > >
> > > > > Is drill just 'extend' optiq to support many other
> > > > > 'stores'(hadoop,
> > > > mongodb, ...)?
> > > > >
> > > > >
> > > > > ---from davy
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Re: what's the differenct between drill and optiq

Reply via email to