SparkSQL: intra-SparkSQL-application table registration

Mohamed Nadjib Mami Mon, 14 Nov 2016 02:04:52 -0800

Hello,

I've asked the following question [1] on Stackoverflow but didn't get ananswer, yet. I use now this channel to give it more visibility, andhopefully find someone who can help.

"*Context.* I have tens of SQL queries stored in separate files. Forbenchmarking purposes, I created an application that iterates througheach of those query files and passes it to a standalone Sparkapplication. This latter /first/ parses the query, extracts the usedtables, registers them (using: registerTempTable() in Spark < 2 andcreateOrReplaceTempView() in Spark 2), and executes effectively thequery (spark.sql()).

*Challenge.* Since registering the tables can sometimes be timeconsuming, I would like to register the tables only once when they arefirst used, and keep that in form of metadata that can readily be usedin the subsequent queries without the need to re-register the tablesagain. It's a sort of intra-job caching but not any of the caching Sparkoffers (table caching), as far as I know.

Is that possible? if not can anyone suggest another approach toaccomplish the same goal (i.e., iterating through separate query filesand run a querying Spark application without registering the tables thathave already been registered before)."

[1]:http://stackoverflow.com/questions/40549924/sparksql-intra-sparksql-application-table-registration


Cheers,
Mohamed

SparkSQL: intra-SparkSQL-application table registration

Reply via email to