I have installed Spark 1.6.1 and trying to connect to a Hive metastore
0.14.0 version.
This was working fine on Spark 1.4.1. I am pointing to same meta store from
1.6.1 and then getting  connectivity issues.

I read over some online threads and added below 2 lines to default spark
conf xml

*spark.sql.hive.metastore.version 0.14.0*
*spark.sql.hive.metastore.jars maven*

then I get this error -

"pyspark.sql.utils.IllegalArgumentException: u'Builtin jars can only be
used when hive execution version == hive metastore version. Execution:
1.2.1 != Metastore: 0.14.0. Specify a vaild path to the correct hive jars
using $HIVE_METASTORE_JARS or change spark.sql.hive.metastore.version to
1.2.1.'"

Without these lines I get the below error

"Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke
the "BONECP" plugin to create a ConnectionPool gave an error : The
specified datastore driver ("org.mariadb.jdbc.Driver") was not found in the
CLASSPATH. Please check your CLASSPATH specification, and the name of the
driver."

Then I commented added new 2 lines i.e. meta store version, jars
from spark-defaults.conf and ran pyspark command with additional jars as -

*pyspark --jars /usr/lib/hive/lib/mariadb-connector-java.jar*

After pyspark, I am trying to work with Hive Context -
*Now this is updating my Hive Metastore to 1.2.0  (which should be 0.14.0)*

It works fine as expected now, connecting to Hive meta store, notebooks
etc. but my big concern is why running PySpark Hive Context is updating Hive
Metastore version?

Thanks!

Reply via email to