Re: what's the differenct between drill and optiq

Jacques Nadeau Wed, 27 May 2015 22:21:08 -0700

Andrew,

As others have pointed out there are definitely differences in how each
different community project leverages Calcite (remember, Apache Kylin,
Phoenix and I believe Flink also use it).  Remember, Calcite--at its
core--is a developers toolkit that other applications/systems incorporate.
While an end user could use Calcite, the most common use is as an embedded
library in a broader system.


The great news is that the community is working together collaborate on an
amazing shared library and framework.

-Jacques



On Wed, May 27, 2015 at 10:10 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Andrew,
>
> Sorry for being cryptic.  Hanifi is more clear.  My point was directed at
> the differences between where Hive may ultimately go and where Drill is
> now.  Hanifi was providing a good summary of where Drill is now.
>
> As he said, Calcite does query parsing and planning.  Ultimately, it will
> do the same for Hive.  Even so, Drill has extended Calcite's planning
> capabilities in ways which are not used by Hive.  These extensions allow
> Calcite to produce plans for the Drill execution engine.  That execution
> engine is what Hanifi meant by flexible distributed columnar execution with
> late binding.
>
> SQL is not normally a late binding language.  Instead, it shows its long
> heritage by being a very statically typed language.  That static typing is
> a problem in the modern world of flexible data and dealing with this
> problem is a key goal of Drill.
>
> The key technological advance in Drill that enables it to address late
> typing problems is something called the ANY type.  This is essentially a
> way for the parser to punt the problem of resolving the type of some value
> until the query is actually running.  At that point, Drill has an empirical
> schema available for each record batch which can be used to do final code
> generation and optimization.  If the empirical schema changes due to
> changes in the data being processed, that code can be regenerated as
> needed.
>
> This is a huge philosophical and design change that is hard to just paste
> onto an existing engine.  Just as it would be next to impossible to modify
> a Pascal or Fortran execution environment to do the type inferencing and
> lazy execution that Scala or Haskell do, it is going to be hard to extend
> Hive's entire execution environment to deal with type dynamism.  Simply
> passing around dynamic types will not give performance anywhere near what
> Drill does because of the inevitable cost of type tag dispatching.
>
> To give just the simplest example, suppose you have data that used a column
> named X to hold an integer for a long while and then switched to using a
> column named Y to hold a floating point number.  To deal with this, you
> might create a view which has a case statement that uses the value of X or
> Y, whichever is non-null.  In conventional SQL engines, the query parser
> and planner would generate code for this case statement and it would
> execute for every record.  With Drill, almost all record batches would have
> *either* X or Y.  Drill would generate different code for those two
> different patterns of data and that code would be generated with the
> knowledge that X is null, or that Y is null.  As such, the optimizer in the
> code generator would actually just completely remove the case statement by
> evaluating it at code generation time.  By pushing that code generation
> time very late in the execution, Drill would have no perceptible penalty
> relative to uniformly typed code, but it would have the ability to deal
> with non-uniform data.
>
>
> My original comment was an indefensible shorthand for all of this.  Things
> should be made as simple as possible, but no simpler, as the great man
> said.
>
>
> On Wed, May 27, 2015 at 8:32 PM, Andrew Brust <
> andrew.br...@bluebadgeinsights.com> wrote:
>
> > That makes sense.  Just having trouble mapping that back on Ted's
> > comment.  But I tend to think that's me and my ignorance.
> >
> > -----Original Message-----
> > From: Hanifi Gunes [mailto:hgu...@maprtech.com]
> > Sent: Wednesday, May 27, 2015 4:48 PM
> > To: user
> > Subject: Re: what's the differenct between drill and optiq
> >
> > Calcite does parsing & planning of queries. Drill executes in a very
> > flexible distributed columnar fashion with late binding.
> >
> > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> >
> > > Andrew,
> > >
> > > What Hive does not have is the extensions that Drill has that allow
> > > SQL to be type flexible.  The ALL type and all of the implications
> > > both in terms of implementation and user impact it has are a really big
> > deal.
> > >
> > >
> > >
> > > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust <
> > > andrew.br...@bluebadgeinsights.com> wrote:
> > >
> > > > Thanks!
> > > >
> > > > Sent from my phone
> > > > <insert witty apology for typos here>
> > > >
> > > > ----- Reply message -----
> > > > From: "PHANI KUMAR YADAVILLI" <phanikumaryadavi...@gmail.com>
> > > > To: "user@drill.apache.org" <user@drill.apache.org>
> > > > Subject: what's the differenct between drill and optiq
> > > > Date: Wed, May 27, 2015 8:33 AM
> > > >
> > > > Yes hive uses calcite. You can refer hive documentation.
> > > > On May 27, 2015 6:01 PM, "Andrew Brust" <
> > > > andrew.br...@bluebadgeinsights.com>
> > > > wrote:
> > > >
> > > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > > Can anyone here confirm or deny that?
> > > > >
> > > > > -----Original Message-----
> > > > > From: Rajkumar Singh [mailto:rsi...@maprtech.com]
> > > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Re: what's the differenct between drill and optiq
> > > > >
> > > > > Optiq(now known as calcite) is an api for query parser,planner and
> > > > > optimization, drill uses it for the SQL parsing,validation and
> > > > > optimization.Drill query planner applies its own custom planner
> > > > > rules
> > > to
> > > > > build the query logical plan.
> > > > >
> > > > > Rajkumar Singh
> > > > >
> > > > >
> > > > >
> > > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <chenlij...@togeek.cn> wrote:
> > > > > >
> > > > > > Hi:
> > > > > >
> > > > > > I just want to know the difference between drill and optiq.
> > > > > >
> > > > > >
> > > > > > Is drill just 'extend' optiq to support many other
> > > > > > 'stores'(hadoop,
> > > > > mongodb, ...)?
> > > > > >
> > > > > >
> > > > > > ---from davy
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: what's the differenct between drill and optiq

Reply via email to