If the data is already in Parquet files, I don't see any reason to involve JDBC at all. You can read Parquet files directly into a DataFrame. https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
On Thu, Feb 18, 2021 at 1:42 PM Scott Ribe <scott_r...@elevated-dev.com> wrote: > I need a little help figuring out how some pieces fit together. I have > some tables in parquet files, and I want to access them using SQL over > JDBC. I gather that I need to run the thrift server, but how do I configure > it to load my files into datasets and expose views? > > The context is this: trying to figure out if we want to use Spark for > historical data, and so far, just using spark shell for some experiments: > > - I have established that we can easily export to Parquet and it is very > efficient at storing this data > - Spark SQL queries the data with reasonable performance > > Now I am at the step of testing whether the client-side that we are > considering can deal effectively with querying the volume of data. > > Which is why I'm looking for the simplest setup. If the client integration > works, then yes we move on to configuring a proper cluster. (And it is a > real question, I've already had one potential client-side piece be totally > incompetent at handling a decent volume of data...) > > (The environment I am working in is just the straight download of > spark-3.0.1-bin-hadoop3.2) > > -- > Scott Ribe > scott_r...@elevated-dev.com > https://www.linkedin.com/in/scottribe/ > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >