Hello, First of all, thank you to everyone working on Spark. I've only been using it for a few weeks now but so far I'm really enjoying it. You saved me from a big, scary elephant! :-)
I was wondering if anyone might be able to offer some advice about working with the Thrift JDBC server? I'm trying to enable members of my team to connect and run some basic SQL queries on a Spark cluster using their favourite JDBC tools. Following the docs [1], I've managed to get something simple up and running but I'd really appreciate it if someone can validate my understanding here, as the docs don't go deeply into the details. Here are a few questions I've not been able to find answers to myself: 1) What exactly is the relationship between the thrift server and Hive? I'm guessing Spark is just making use of the Hive metastore to access table definitions, and maybe some other things, is that the case? 2) Am I therefore right in thinking that SQL queries sent to the thrift server are still executed on the Spark cluster, using Spark SQL, and Hive plays no active part in computation of results? 3) What SQL flavour is actually supported by the Thrift Server? Is it Spark SQL, Hive, or both? I've confused, because I've seen it accepting Hive CREATE TABLE syntax, but Spark SQL seems to work too? 4) When I run SQL queries using the Scala or Python shells, Spark seems to figure out the schema by itself from my Parquet files very well, if I use createTempTable on the DataFrame. It seems when running the thrift server, I need to create a Hive table definition first? Is that the case, or did I miss something? If it is, is there some sensible way to automate this? Many thanks! James [1] https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server