Re: No way to supply hive-site.xml in yarn client mode?

Zoltan Fedor Thu, 29 Oct 2015 07:28:43 -0700

Yes, I am. It was compiled with the following:

export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3
export SPARK_YARN=true
export SPARK_HIVE=true
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive
-Phive-thriftserver -DskipTests clean package


On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar <deenar.toras...@gmail.com
> wrote:

> Are you using Spark built with hive ?
>
> # Apache Hadoop 2.4.X with Hive 13 support
> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver 
> -DskipTests clean package
>
>
> On 29 October 2015 at 13:08, Zoltan Fedor <zoltan.0.fe...@gmail.com>
> wrote:
>
>> Hi Deenar,
>> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR
>> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR ($SPARK_HOME/conf/yarn-conf) and
>> use the below to start pyspark, but the error is the exact same as before.
>>
>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn
>> $SPARK_HOME/bin/pyspark --deploy-mode client
>>
>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 15/10/29 09:06:36 WARN MetricsSystem: Using default name DAGScheduler for
>> source because spark.app.id is not set.
>> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name hive.metastore.local
>> does not exist
>> Welcome to
>>       ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
>>       /_/
>>
>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
>> SparkContext available as sc, HiveContext available as sqlContext.
>> >>> sqlContext2 = HiveContext(sc)
>> >>> sqlContext2 = HiveContext(sc)
>> >>> sqlContext2.sql("show databases").first()
>> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name hive.metastore.local
>> does not exist
>> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got exception trying
>> to get groups for user biapp: id: biapp: No such user
>>
>> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available for user
>> biapp
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>> line 552, in sql
>>     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>>   File
>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>> line 660, in _ssql_ctx
>>     "build/sbt assembly", e)
>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and
>> run build/sbt assembly", Py4JJavaError(u'An error occurred while calling
>> None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20))
>> >>>
>>
>>
>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar <
>> deenar.toras...@gmail.com> wrote:
>>
>>> *Hi Zoltan*
>>>
>>> Add hive-site.xml to your YARN_CONF_DIR. i.e. $SPARK_HOME/conf/yarn-conf
>>>
>>> Deenar
>>>
>>> *Think Reactive Ltd*
>>> deenar.toras...@thinkreactive.co.uk
>>> 07714140812
>>>
>>> On 28 October 2015 at 14:28, Zoltan Fedor <zoltan.0.fe...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 on it
>>>> in yarn client mode with Hive.
>>>>
>>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I am not
>>>> able to make SparkSQL to pick up the hive-site.xml when runnig pyspark.
>>>>
>>>> hive-site.xml is located in $SPARK_HOME/hadoop-conf/hive-site.xml and
>>>> also in $SPARK_HOME/conf/hive-site.xml
>>>>
>>>> When I start pyspark with the below command and then run some simple
>>>> SparkSQL it fails, it seems it didn't pic up the settings in hive-site.xml
>>>>
>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn
>>>> $SPARK_HOME/bin/pyspark --deploy-mode client
>>>>
>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>> explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name DAGScheduler
>>>> for source because spark.app.id is not set.
>>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name hive.metastore.local
>>>> does not exist
>>>> Welcome to
>>>>       ____              __
>>>>      / __/__  ___ _____/ /__
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
>>>>       /_/
>>>>
>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>> >>> sqlContext2 = HiveContext(sc)
>>>> >>> sqlContext2.sql("show databases").first()
>>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name hive.metastore.local
>>>> does not exist
>>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got exception
>>>> trying to get groups for user biapp: id: biapp: No such user
>>>>
>>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups available for
>>>> user biapp
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>>   File
>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>> line 552, in sql
>>>>     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>>>>   File
>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>> line 660, in _ssql_ctx
>>>>     "build/sbt assembly", e)
>>>> Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true'
>>>> and run build/sbt assembly", Py4JJavaError(u'An error occurred while
>>>> calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20))
>>>> >>>
>>>>
>>>>
>>>> See in the above the warning about "WARN HiveConf: HiveConf of name
>>>> hive.metastore.local does not exist" while actually there is a
>>>> hive.metastore.local attribute in the hive-site.xml
>>>>
>>>> Any idea how to submit hive-site.xml in yarn client mode?
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>

Re: No way to supply hive-site.xml in yarn client mode?

Reply via email to