Yes, I am. It was compiled with the following: export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3 export SPARK_YARN=true export SPARK_HIVE=true export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive -Phive-thriftserver -DskipTests clean package
On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar <deenar.toras...@gmail.com > wrote: > Are you using Spark built with hive ? > > # Apache Hadoop 2.4.X with Hive 13 support > mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver > -DskipTests clean package > > > On 29 October 2015 at 13:08, Zoltan Fedor <zoltan.0.fe...@gmail.com> > wrote: > >> Hi Deenar, >> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR >> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR ($SPARK_HOME/conf/yarn-conf) and >> use the below to start pyspark, but the error is the exact same as before. >> >> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >> $SPARK_HOME/bin/pyspark --deploy-mode client >> >> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> SLF4J: Class path contains multiple SLF4J bindings. >> SLF4J: Found binding in >> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> SLF4J: Found binding in >> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >> explanation. >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >> 15/10/29 09:06:36 WARN MetricsSystem: Using default name DAGScheduler for >> source because spark.app.id is not set. >> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name hive.metastore.local >> does not exist >> Welcome to >> ____ __ >> / __/__ ___ _____/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >> /_/ >> >> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >> SparkContext available as sc, HiveContext available as sqlContext. >> >>> sqlContext2 = HiveContext(sc) >> >>> sqlContext2 = HiveContext(sc) >> >>> sqlContext2.sql("show databases").first() >> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name hive.metastore.local >> does not exist >> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got exception trying >> to get groups for user biapp: id: biapp: No such user >> >> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available for user >> biapp >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File >> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >> line 552, in sql >> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >> File >> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >> line 660, in _ssql_ctx >> "build/sbt assembly", e) >> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and >> run build/sbt assembly", Py4JJavaError(u'An error occurred while calling >> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20)) >> >>> >> >> >> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar < >> deenar.toras...@gmail.com> wrote: >> >>> *Hi Zoltan* >>> >>> Add hive-site.xml to your YARN_CONF_DIR. i.e. $SPARK_HOME/conf/yarn-conf >>> >>> Deenar >>> >>> *Think Reactive Ltd* >>> deenar.toras...@thinkreactive.co.uk >>> 07714140812 >>> >>> On 28 October 2015 at 14:28, Zoltan Fedor <zoltan.0.fe...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 on it >>>> in yarn client mode with Hive. >>>> >>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I am not >>>> able to make SparkSQL to pick up the hive-site.xml when runnig pyspark. >>>> >>>> hive-site.xml is located in $SPARK_HOME/hadoop-conf/hive-site.xml and >>>> also in $SPARK_HOME/conf/hive-site.xml >>>> >>>> When I start pyspark with the below command and then run some simple >>>> SparkSQL it fails, it seems it didn't pic up the settings in hive-site.xml >>>> >>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>> >>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>> SLF4J: Class path contains multiple SLF4J bindings. >>>> SLF4J: Found binding in >>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>> SLF4J: Found binding in >>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>> explanation. >>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name DAGScheduler >>>> for source because spark.app.id is not set. >>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load native-hadoop >>>> library for your platform... using builtin-java classes where applicable >>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name hive.metastore.local >>>> does not exist >>>> Welcome to >>>> ____ __ >>>> / __/__ ___ _____/ /__ >>>> _\ \/ _ \/ _ `/ __/ '_/ >>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>> /_/ >>>> >>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>> SparkContext available as sc, HiveContext available as sqlContext. >>>> >>> sqlContext2 = HiveContext(sc) >>>> >>> sqlContext2.sql("show databases").first() >>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name hive.metastore.local >>>> does not exist >>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got exception >>>> trying to get groups for user biapp: id: biapp: No such user >>>> >>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups available for >>>> user biapp >>>> Traceback (most recent call last): >>>> File "<stdin>", line 1, in <module> >>>> File >>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>> line 552, in sql >>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>> File >>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>> line 660, in _ssql_ctx >>>> "build/sbt assembly", e) >>>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' >>>> and run build/sbt assembly", Py4JJavaError(u'An error occurred while >>>> calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20)) >>>> >>> >>>> >>>> >>>> See in the above the warning about "WARN HiveConf: HiveConf of name >>>> hive.metastore.local does not exist" while actually there is a >>>> hive.metastore.local attribute in the hive-site.xml >>>> >>>> Any idea how to submit hive-site.xml in yarn client mode? >>>> >>>> Thanks >>>> >>> >>> >> >