Re: what's the differenct between drill and optiq

Jinfeng Ni Thu, 28 May 2015 09:15:05 -0700

AFAIK,  traditional RDBMSes do not do later binding functionality for SQL
query.  But their XQuery/XPath extension may do late binding.  This is
because an XML is simply a semi structured doc, just like what JSON looks
like. The query planner normally does not have knowledge of the schema of
XML doc.  (XML doc could be associated with an optional XML schema, which
is mainly used for verification purpose).    Therefore, for XPath /a/b/c/d,
it could be an integer, or string, or date, and query planner does not bind
it with any concrete type; it is deferred to execution time, when the XML
doc is explored.  If you look at XML data model, you will see concept of
xs:anyAtomicType etc. In some sense, this is similar to SQL ANY type Drill
used.





On Thu, May 28, 2015 at 8:08 AM, Andrew Brust <
andrew.br...@bluebadgeinsights.com> wrote:

> Absolutely nothing to apologize for, and the below explanation is very
> helpful.
>
> FWIW, I certainly understood that Hive's use of Calcite offered relatively
> little in the way of type flexibility/late binding, compare to Drill.  I
> get that Drill's entire raison d'etre is around this and never thought that
> Hive "had it too."  It was more a question of my being surprised that the
> query planners had any common technology at all.  I have never coded in
> Scala or Haskell, but I have coded plenty in C#, Pascal and VB, and I can
> apprecaiute the analogy just by having experience with one half of it.
>
> It's part of the reason I think Drill is so cool, and part of the reason
> why MapR did so well in one of Gigaom's last Sector Roadmaps.
>
> My "ponder question" is whether mainstream RDBMSes like Oracle and SQL
> Server will one day add Drill-like late binding functionality.
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
> Sent: Thursday, May 28, 2015 1:10 AM
> To: user@drill.apache.org
> Subject: Re: what's the differenct between drill and optiq
>
> Andrew,
>
> Sorry for being cryptic.  Hanifi is more clear.  My point was directed at
> the differences between where Hive may ultimately go and where Drill is
> now.  Hanifi was providing a good summary of where Drill is now.
>
> As he said, Calcite does query parsing and planning.  Ultimately, it will
> do the same for Hive.  Even so, Drill has extended Calcite's planning
> capabilities in ways which are not used by Hive.  These extensions allow
> Calcite to produce plans for the Drill execution engine.  That execution
> engine is what Hanifi meant by flexible distributed columnar execution with
> late binding.
>
> SQL is not normally a late binding language.  Instead, it shows its long
> heritage by being a very statically typed language.  That static typing is
> a problem in the modern world of flexible data and dealing with this
> problem is a key goal of Drill.
>
> The key technological advance in Drill that enables it to address late
> typing problems is something called the ANY type.  This is essentially a
> way for the parser to punt the problem of resolving the type of some value
> until the query is actually running.  At that point, Drill has an empirical
> schema available for each record batch which can be used to do final code
> generation and optimization.  If the empirical schema changes due to
> changes in the data being processed, that code can be regenerated as needed.
>
> This is a huge philosophical and design change that is hard to just paste
> onto an existing engine.  Just as it would be next to impossible to modify
> a Pascal or Fortran execution environment to do the type inferencing and
> lazy execution that Scala or Haskell do, it is going to be hard to extend
> Hive's entire execution environment to deal with type dynamism.  Simply
> passing around dynamic types will not give performance anywhere near what
> Drill does because of the inevitable cost of type tag dispatching.
>
> To give just the simplest example, suppose you have data that used a
> column named X to hold an integer for a long while and then switched to
> using a column named Y to hold a floating point number.  To deal with this,
> you might create a view which has a case statement that uses the value of X
> or Y, whichever is non-null.  In conventional SQL engines, the query parser
> and planner would generate code for this case statement and it would
> execute for every record.  With Drill, almost all record batches would have
> *either* X or Y.  Drill would generate different code for those two
> different patterns of data and that code would be generated with the
> knowledge that X is null, or that Y is null.  As such, the optimizer in the
> code generator would actually just completely remove the case statement by
> evaluating it at code generation time.  By pushing that code generation
> time very late in the execution, Drill would have no perceptible penalty
> relative to uniformly typed code, but it would have the ability to deal
> with non-uniform data.
>
>
> My original comment was an indefensible shorthand for all of this.  Things
> should be made as simple as possible, but no simpler, as the great man said.
>
>
> On Wed, May 27, 2015 at 8:32 PM, Andrew Brust <
> andrew.br...@bluebadgeinsights.com> wrote:
>
> > That makes sense.  Just having trouble mapping that back on Ted's
> > comment.  But I tend to think that's me and my ignorance.
> >
> > -----Original Message-----
> > From: Hanifi Gunes [mailto:hgu...@maprtech.com]
> > Sent: Wednesday, May 27, 2015 4:48 PM
> > To: user
> > Subject: Re: what's the differenct between drill and optiq
> >
> > Calcite does parsing & planning of queries. Drill executes in a very
> > flexible distributed columnar fashion with late binding.
> >
> > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> >
> > > Andrew,
> > >
> > > What Hive does not have is the extensions that Drill has that allow
> > > SQL to be type flexible.  The ALL type and all of the implications
> > > both in terms of implementation and user impact it has are a really
> > > big
> > deal.
> > >
> > >
> > >
> > > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust <
> > > andrew.br...@bluebadgeinsights.com> wrote:
> > >
> > > > Thanks!
> > > >
> > > > Sent from my phone
> > > > <insert witty apology for typos here>
> > > >
> > > > ----- Reply message -----
> > > > From: "PHANI KUMAR YADAVILLI" <phanikumaryadavi...@gmail.com>
> > > > To: "user@drill.apache.org" <user@drill.apache.org>
> > > > Subject: what's the differenct between drill and optiq
> > > > Date: Wed, May 27, 2015 8:33 AM
> > > >
> > > > Yes hive uses calcite. You can refer hive documentation.
> > > > On May 27, 2015 6:01 PM, "Andrew Brust" <
> > > > andrew.br...@bluebadgeinsights.com>
> > > > wrote:
> > > >
> > > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > > Can anyone here confirm or deny that?
> > > > >
> > > > > -----Original Message-----
> > > > > From: Rajkumar Singh [mailto:rsi...@maprtech.com]
> > > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Re: what's the differenct between drill and optiq
> > > > >
> > > > > Optiq(now known as calcite) is an api for query parser,planner
> > > > > and optimization, drill uses it for the SQL parsing,validation
> > > > > and optimization.Drill query planner applies its own custom
> > > > > planner rules
> > > to
> > > > > build the query logical plan.
> > > > >
> > > > > Rajkumar Singh
> > > > >
> > > > >
> > > > >
> > > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <chenlij...@togeek.cn> wrote:
> > > > > >
> > > > > > Hi:
> > > > > >
> > > > > > I just want to know the difference between drill and optiq.
> > > > > >
> > > > > >
> > > > > > Is drill just 'extend' optiq to support many other
> > > > > > 'stores'(hadoop,
> > > > > mongodb, ...)?
> > > > > >
> > > > > >
> > > > > > ---from davy
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: what's the differenct between drill and optiq

Reply via email to