Andrew, As others have pointed out there are definitely differences in how each different community project leverages Calcite (remember, Apache Kylin, Phoenix and I believe Flink also use it). Remember, Calcite--at its core--is a developers toolkit that other applications/systems incorporate. While an end user could use Calcite, the most common use is as an embedded library in a broader system.
The great news is that the community is working together collaborate on an amazing shared library and framework. -Jacques On Wed, May 27, 2015 at 10:10 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Andrew, > > Sorry for being cryptic. Hanifi is more clear. My point was directed at > the differences between where Hive may ultimately go and where Drill is > now. Hanifi was providing a good summary of where Drill is now. > > As he said, Calcite does query parsing and planning. Ultimately, it will > do the same for Hive. Even so, Drill has extended Calcite's planning > capabilities in ways which are not used by Hive. These extensions allow > Calcite to produce plans for the Drill execution engine. That execution > engine is what Hanifi meant by flexible distributed columnar execution with > late binding. > > SQL is not normally a late binding language. Instead, it shows its long > heritage by being a very statically typed language. That static typing is > a problem in the modern world of flexible data and dealing with this > problem is a key goal of Drill. > > The key technological advance in Drill that enables it to address late > typing problems is something called the ANY type. This is essentially a > way for the parser to punt the problem of resolving the type of some value > until the query is actually running. At that point, Drill has an empirical > schema available for each record batch which can be used to do final code > generation and optimization. If the empirical schema changes due to > changes in the data being processed, that code can be regenerated as > needed. > > This is a huge philosophical and design change that is hard to just paste > onto an existing engine. Just as it would be next to impossible to modify > a Pascal or Fortran execution environment to do the type inferencing and > lazy execution that Scala or Haskell do, it is going to be hard to extend > Hive's entire execution environment to deal with type dynamism. Simply > passing around dynamic types will not give performance anywhere near what > Drill does because of the inevitable cost of type tag dispatching. > > To give just the simplest example, suppose you have data that used a column > named X to hold an integer for a long while and then switched to using a > column named Y to hold a floating point number. To deal with this, you > might create a view which has a case statement that uses the value of X or > Y, whichever is non-null. In conventional SQL engines, the query parser > and planner would generate code for this case statement and it would > execute for every record. With Drill, almost all record batches would have > *either* X or Y. Drill would generate different code for those two > different patterns of data and that code would be generated with the > knowledge that X is null, or that Y is null. As such, the optimizer in the > code generator would actually just completely remove the case statement by > evaluating it at code generation time. By pushing that code generation > time very late in the execution, Drill would have no perceptible penalty > relative to uniformly typed code, but it would have the ability to deal > with non-uniform data. > > > My original comment was an indefensible shorthand for all of this. Things > should be made as simple as possible, but no simpler, as the great man > said. > > > On Wed, May 27, 2015 at 8:32 PM, Andrew Brust < > andrew.br...@bluebadgeinsights.com> wrote: > > > That makes sense. Just having trouble mapping that back on Ted's > > comment. But I tend to think that's me and my ignorance. > > > > -----Original Message----- > > From: Hanifi Gunes [mailto:hgu...@maprtech.com] > > Sent: Wednesday, May 27, 2015 4:48 PM > > To: user > > Subject: Re: what's the differenct between drill and optiq > > > > Calcite does parsing & planning of queries. Drill executes in a very > > flexible distributed columnar fashion with late binding. > > > > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > > > > Andrew, > > > > > > What Hive does not have is the extensions that Drill has that allow > > > SQL to be type flexible. The ALL type and all of the implications > > > both in terms of implementation and user impact it has are a really big > > deal. > > > > > > > > > > > > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust < > > > andrew.br...@bluebadgeinsights.com> wrote: > > > > > > > Thanks! > > > > > > > > Sent from my phone > > > > <insert witty apology for typos here> > > > > > > > > ----- Reply message ----- > > > > From: "PHANI KUMAR YADAVILLI" <phanikumaryadavi...@gmail.com> > > > > To: "user@drill.apache.org" <user@drill.apache.org> > > > > Subject: what's the differenct between drill and optiq > > > > Date: Wed, May 27, 2015 8:33 AM > > > > > > > > Yes hive uses calcite. You can refer hive documentation. > > > > On May 27, 2015 6:01 PM, "Andrew Brust" < > > > > andrew.br...@bluebadgeinsights.com> > > > > wrote: > > > > > > > > > Folks at Hortonworks told me that Hive now uses Calcite as well. > > > > > Can anyone here confirm or deny that? > > > > > > > > > > -----Original Message----- > > > > > From: Rajkumar Singh [mailto:rsi...@maprtech.com] > > > > > Sent: Wednesday, May 27, 2015 6:52 AM > > > > > To: user@drill.apache.org > > > > > Subject: Re: what's the differenct between drill and optiq > > > > > > > > > > Optiq(now known as calcite) is an api for query parser,planner and > > > > > optimization, drill uses it for the SQL parsing,validation and > > > > > optimization.Drill query planner applies its own custom planner > > > > > rules > > > to > > > > > build the query logical plan. > > > > > > > > > > Rajkumar Singh > > > > > > > > > > > > > > > > > > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <chenlij...@togeek.cn> wrote: > > > > > > > > > > > > Hi: > > > > > > > > > > > > I just want to know the difference between drill and optiq. > > > > > > > > > > > > > > > > > > Is drill just 'extend' optiq to support many other > > > > > > 'stores'(hadoop, > > > > > mongodb, ...)? > > > > > > > > > > > > > > > > > > ---from davy > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >