Hi, Just to give some context. We are using Hive metastore with csv & Parquet files as a part of our ETL pipeline. We query these with SparkSQL to do some down stream work.
I'm curious whats the best way to go about testing Hive & SparkSQL? I'm using 1.1.0 I see that the LocalHiveContext has been depreciated. https://issues.apache.org/jira/browse/SPARK-2397 My testing strategy is as part of my Before block I basically create the HiveContext then create the databases/tables and map them to some test sample data files in my test resources directory. The LocalSparkContext was useful because I could inject this as part of the test setup and it would take care of creating the metastore and warehouse directories for hive for me (local to my project). If I just create a Hive context it does create the metastore_db folder locally. But the warehouse directory is not created! Thus running a command like hc.sql("CREATE DATABASE myDb") results in a Hive error. I also can't supply a test hive-site.xml because it wont allow relative paths. Which means that there is some shared directory that everyone needs to have. The only other option is to call the setConf method like LocalSparkContext does. Since LocalSparkContext is on the way out, I'm wondering if I'm doing something stupid. Is there a better way to mock this out and test Hive/metastore with SparkSQL? Cheers, ~N