Hi,
We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 on it in
yarn client mode with Hive.

I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I am not
able to make SparkSQL to pick up the hive-site.xml when runnig pyspark.

hive-site.xml is located in $SPARK_HOME/hadoop-conf/hive-site.xml and also
in $SPARK_HOME/conf/hive-site.xml

When I start pyspark with the below command and then run some simple
SparkSQL it fails, it seems it didn't pic up the settings in hive-site.xml

$ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn
$SPARK_HOME/bin/pyspark --deploy-mode client

Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/10/28 10:22:33 WARN MetricsSystem: Using default name DAGScheduler for
source because spark.app.id is not set.
15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/10/28 10:22:59 WARN HiveConf: HiveConf of name hive.metastore.local does
not exist
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
      /_/

Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sqlContext2 = HiveContext(sc)
>>> sqlContext2.sql("show databases").first()
15/10/28 10:23:12 WARN HiveConf: HiveConf of name hive.metastore.local does
not exist
15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got exception trying to
get groups for user biapp: id: biapp: No such user

15/10/28 10:23:13 WARN UserGroupInformation: No groups available for user
biapp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
line 552, in sql
    return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
  File
"/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
line 660, in _ssql_ctx
    "build/sbt assembly", e)
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and
run build/sbt assembly", Py4JJavaError(u'An error occurred while calling
None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20))
>>>


See in the above the warning about "WARN HiveConf: HiveConf of name
hive.metastore.local does not exist" while actually there is a
hive.metastore.local attribute in the hive-site.xml

Any idea how to submit hive-site.xml in yarn client mode?

Thanks

Reply via email to