Re: SQL language vs DataFrame API

2015-12-09 Thread Stephen Boesch
Is this a candidate for the version 1.X/2.0 split? 2015-12-09 16:29 GMT-08:00 Michael Armbrust : > Yeah, I would like to address any actual gaps in functionality that are > present. > > On Wed, Dec 9, 2015 at 4:24 PM, Cristian Opris > wrote: > >> The reason I'm asking is because it's important i

Re: SQL language vs DataFrame API

2015-12-09 Thread Michael Armbrust
Yeah, I would like to address any actual gaps in functionality that are present. On Wed, Dec 9, 2015 at 4:24 PM, Cristian Opris wrote: > The reason I'm asking is because it's important in larger projects to be > able to stick to a particular programming style. Some people are more > comfortable

Re: SQL language vs DataFrame API

2015-12-09 Thread Xiao Li
That sounds great! When it is decided, please let us know and we can add more features and make it ANSI SQL compliant. Thank you! Xiao Li 2015-12-09 11:31 GMT-08:00 Michael Armbrust : > I don't plan to abandon HiveQL compatibility, but I'd like to see us move > towards something with more SQL

Re: SQL language vs DataFrame API

2015-12-09 Thread Michael Armbrust
I don't plan to abandon HiveQL compatibility, but I'd like to see us move towards something with more SQL compliance (perhaps just newer versions of the HiveQL parser). Exactly which parser will do that for us is under investigation. On Wed, Dec 9, 2015 at 11:02 AM, Xiao Li wrote: > Hi, Michael

Re: SQL language vs DataFrame API

2015-12-09 Thread Xiao Li
Hi, Michael, Does that mean SqlContext will be built on HiveQL in the near future? Thanks, Xiao Li 2015-12-09 10:36 GMT-08:00 Michael Armbrust : > I think that it is generally good to have parity when the functionality is > useful. However, in some cases various features are there just to ma

Re: SQL language vs DataFrame API

2015-12-09 Thread Michael Armbrust
I think that it is generally good to have parity when the functionality is useful. However, in some cases various features are there just to maintain compatibility with other system. For example CACHE TABLE is eager because Shark's cache table was. df.cache() is lazy because Spark's cache is. Do

SQL language vs DataFrame API

2015-12-09 Thread Cristian O
Hi, I was wondering what the "official" view is on feature parity between SQL and DF apis. Docs are pretty sparse on the SQL front, and it seems that some features are only supported at various times in only one of Spark SQL dialect, HiveQL dialect and DF API. DF.cube(), DISTRIBUTE BY, CACHE LAZY