If the data is already in Parquet files, I don't see any reason to involve
JDBC at all.  You can read Parquet files directly into a DataFrame.
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

On Thu, Feb 18, 2021 at 1:42 PM Scott Ribe <scott_r...@elevated-dev.com>
wrote:

> I need a little help figuring out how some pieces fit together. I have
> some tables in parquet files, and I want to access them using SQL over
> JDBC. I gather that I need to run the thrift server, but how do I configure
> it to load my files into datasets and expose views?
>
> The context is this: trying to figure out if we want to use Spark for
> historical data, and so far, just using spark shell for some experiments:
>
> - I have established that we can easily export to Parquet and it is very
> efficient at storing this data
> - Spark SQL queries the data with reasonable performance
>
> Now I am at the step of testing whether the client-side that we are
> considering can deal effectively with querying the volume of data.
>
> Which is why I'm looking for the simplest setup. If the client integration
> works, then yes we move on to configuring a proper cluster. (And it is a
> real question, I've already had one potential client-side piece be totally
> incompetent at handling a decent volume of data...)
>
> (The environment I am working in is just the straight download of
> spark-3.0.1-bin-hadoop3.2)
>
> --
> Scott Ribe
> scott_r...@elevated-dev.com
> https://www.linkedin.com/in/scottribe/
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to