Re: Announcing Spark SQL

Michael Armbrust Sat, 29 Mar 2014 11:13:23 -0700

On Fri, Mar 28, 2014 at 9:53 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>
> Upon discussion with couple of our clients, it seems the reason they would
> prefer using hive is that they have already invested a lot in it. Mostly in
> UDFs and HiveQL.
> 1. Are there any plans to develop the SQL Parser to handdle more complex
> queries like HiveQL? Can we just plugin a custom parser instead of bringing
> in the whole hive deps?
>


We definitely want to have a more complete SQL parser without having to
pull in all of hive.  I think there are a couple of ways to do this.

1. Using a SQL-92 parser from something like optiq or writing our own
2. I haven't fully investigated the hive published artifacts, but if there
is some way to depend on only the parser that would be great.  If someone
has resources to investigate using the Hive parser without needing to
depend on all of hive this is a place where we would certainly welcome
contributions.  We could then consider making hiveql an option in a
standard SQLContext.


> 2. Is there any way we can support UDFs in Catalyst without using Hive? It
> will bee fine if we don't support Hive UDFs as is and need minor porting
> effort.
>

All of the execution support for native scala udfs is already there, and in
fact when you use the DSL where
clause<http://people.apache.org/~pwendell/catalyst-docs/api/sql/core/index.html#org.apache.spark.sql.SchemaRDD>you
are using this machinery.  For Spark 1.1 we will find a more general
way to expose this to users.

Re: Announcing Spark SQL

Reply via email to