I have installed Spark 1.6.1 and trying to connect to a Hive metastore 0.14.0 version. This was working fine on Spark 1.4.1. I am pointing to same meta store from 1.6.1 and then getting connectivity issues.
I read over some online threads and added below 2 lines to default spark conf xml *spark.sql.hive.metastore.version 0.14.0* *spark.sql.hive.metastore.jars maven* then I get this error - "pyspark.sql.utils.IllegalArgumentException: u'Builtin jars can only be used when hive execution version == hive metastore version. Execution: 1.2.1 != Metastore: 0.14.0. Specify a vaild path to the correct hive jars using $HIVE_METASTORE_JARS or change spark.sql.hive.metastore.version to 1.2.1.'" Without these lines I get the below error "Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.mariadb.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver." Then I commented added new 2 lines i.e. meta store version, jars from spark-defaults.conf and ran pyspark command with additional jars as - *pyspark --jars /usr/lib/hive/lib/mariadb-connector-java.jar* After pyspark, I am trying to work with Hive Context - *Now this is updating my Hive Metastore to 1.2.0 (which should be 0.14.0)* It works fine as expected now, connecting to Hive meta store, notebooks etc. but my big concern is why running PySpark Hive Context is updating Hive Metastore version? Thanks!