Advice using Spark SQL and Thrift JDBC Server

James Aley Tue, 07 Apr 2015 06:30:09 -0700

Hello,

First of all, thank you to everyone working on Spark. I've only been using
it for a few weeks now but so far I'm really enjoying it. You saved me from
a big, scary elephant! :-)

I was wondering if anyone might be able to offer some advice about working
with the Thrift JDBC server? I'm trying to enable members of my team to
connect and run some basic SQL queries on a Spark cluster using their
favourite JDBC tools. Following the docs [1], I've managed to get something
simple up and running but I'd really appreciate it if someone can validate
my understanding here, as the docs don't go deeply into the details.

Here are a few questions I've not been able to find answers to myself:

1) What exactly is the relationship between the thrift server and Hive? I'm
guessing Spark is just making use of the Hive metastore to access table
definitions, and maybe some other things, is that the case?

2) Am I therefore right in thinking that SQL queries sent to the thrift
server are still executed on the Spark cluster, using Spark SQL, and Hive
plays no active part in computation of results?

3) What SQL flavour is actually supported by the Thrift Server? Is it Spark
SQL, Hive, or both? I've confused, because I've seen it accepting Hive
CREATE TABLE syntax, but Spark SQL seems to work too?

4) When I run SQL queries using the Scala or Python shells, Spark seems to
figure out the schema by itself from my Parquet files very well, if I use
createTempTable on the DataFrame. It seems when running the thrift server,
I need to create a Hive table definition first? Is that the case, or did I
miss something? If it is, is there some sensible way to automate this?

Many thanks!

James

[1]
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server

Advice using Spark SQL and Thrift JDBC Server

Reply via email to