I need a little help figuring out how some pieces fit together. I have some 
tables in parquet files, and I want to access them using SQL over JDBC. I 
gather that I need to run the thrift server, but how do I configure it to load 
my files into datasets and expose views?

The context is this: trying to figure out if we want to use Spark for 
historical data, and so far, just using spark shell for some experiments:

- I have established that we can easily export to Parquet and it is very 
efficient at storing this data
- Spark SQL queries the data with reasonable performance

Now I am at the step of testing whether the client-side that we are considering 
can deal effectively with querying the volume of data.

Which is why I'm looking for the simplest setup. If the client integration 
works, then yes we move on to configuring a proper cluster. (And it is a real 
question, I've already had one potential client-side piece be totally 
incompetent at handling a decent volume of data...)

(The environment I am working in is just the straight download of 
spark-3.0.1-bin-hadoop3.2)

--
Scott Ribe
scott_r...@elevated-dev.com
https://www.linkedin.com/in/scottribe/




---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to