Thanks. I didn't have a spark-defaults.conf, nor a spark-env.sh, so I
copied yours and modified the references, so now I am back to where I
started. Exact same error as before


$ HADOOP_USER_NAME=biapp MASTER=yarn $SPARK_HOME/bin/pyspark --deploy-mode
client
Error: JAVA_HOME is not set and could not be found.
Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/lib/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/avro/avro-tools-1.7.6-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/10/29 11:56:55 WARN MetricsSystem: Using default name DAGScheduler for
source because spark.app.id is not set.
15/10/29 11:57:26 WARN HiveConf: HiveConf of name hive.metastore.local does
not exist
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
      /_/

Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sqlContext2 = HiveContext(sc)
>>> sqlContext2.sql("show databases").first()
15/10/29 11:57:43 WARN HiveConf: HiveConf of name hive.metastore.local does
not exist
15/10/29 11:57:43 WARN ShellBasedUnixGroupsMapping: got exception trying to
get groups for user biapp: id: biapp: No such user

15/10/29 11:57:43 WARN UserGroupInformation: No groups available for user
biapp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py",
line 552, in sql
    return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
  File "/usr/lib/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py",
line 660, in _ssql_ctx
    "build/sbt assembly", e)
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and
run build/sbt assembly", Py4JJavaError(u'An error occurred while calling
None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o24))
>>>


On Thu, Oct 29, 2015 at 11:44 AM, Deenar Toraskar <deenar.toras...@gmail.com
> wrote:

>
> Zoltan
>
> you should have these in your existing CDH 5.3, that's the best place to
> get them. Find where spark is running from and should should have them
>
> My versions are here
>
> https://gist.github.com/deenar/08fc4ac0da3bdaff10fb
>
> Deenar
>
> On 29 October 2015 at 15:29, Zoltan Fedor <zoltan.0.fe...@gmail.com>
> wrote:
>
>> i don't have spark-defaults.conf and spark-env.sh, so if you have a
>> working Spark 1.5.1 with Hive metastore access on CDH 5.3 then could you
>> please send over the settings you are having in your spark-defaults.conf
>> and spark-env.sh?
>> Thanks
>>
>> On Thu, Oct 29, 2015 at 11:14 AM, Deenar Toraskar <
>> deenar.toras...@gmail.com> wrote:
>>
>>> Here is what I did, maybe that will help you.
>>>
>>> 1) Downloaded spark-1.5.1 (With HAdoop 2.6.0) spark-1.5.1-bin-hadoop2.6
>>> and extracted it on the edge node, set SPARK_HOME to this location
>>> 2) Copied the existing configuration (spark-defaults.conf and
>>> spark-env.sh) from your spark install
>>> (/opt/cloudera/parcels/CDH/lib/spark/conf/yarn-conf on our environment) to
>>> $SPARK_HOME/conf
>>> 3) updated spark.yarn.jar in spark-defaults.conf
>>> 4) copied over all the configuration files from
>>> /opt/cloudera/parcels/CDH/lib/spark/conf/yarn-conf to
>>> $SPARK_HOME/conf/yarn-conf
>>>
>>> and it worked. You may be better off with a custom build for CDH 5.3.3
>>> hadoop, which you already have done.
>>>
>>> Deenar
>>>
>>> On 29 October 2015 at 14:35, Zoltan Fedor <zoltan.0.fe...@gmail.com>
>>> wrote:
>>>
>>>> Sure, I did it with spark-shell, which seems to be showing the same
>>>> error - not using the hive-site.xml
>>>>
>>>>
>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn
>>>> $SPARK_HOME/bin/pyspark --deploy-mode client --driver-class-path
>>>> $HIVE_CLASSPATH
>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>> explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 15/10/29 10:33:20 WARN MetricsSystem: Using default name DAGScheduler
>>>> for source because spark.app.id is not set.
>>>> 15/10/29 10:33:22 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 15/10/29 10:33:50 WARN HiveConf: HiveConf of name hive.metastore.local
>>>> does not exist
>>>> Welcome to
>>>>       ____              __
>>>>      / __/__  ___ _____/ /__
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
>>>>       /_/
>>>>
>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>> >>>
>>>> biapps@biapps-qa01:~> HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn
>>>> $SPARK_HOME/bin/spark-shell --deploy-mode client
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>> explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> Welcome to
>>>>       ____              __
>>>>      / __/__  ___ _____/ /__
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.1
>>>>       /_/
>>>>
>>>> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_91)
>>>> Type in expressions to have them evaluated.
>>>> Type :help for more information.
>>>> 15/10/29 10:34:15 WARN MetricsSystem: Using default name DAGScheduler
>>>> for source because spark.app.id is not set.
>>>> 15/10/29 10:34:16 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> Spark context available as sc.
>>>> 15/10/29 10:34:46 WARN HiveConf: HiveConf of name hive.metastore.local
>>>> does not exist
>>>> 15/10/29 10:34:46 WARN ShellBasedUnixGroupsMapping: got exception
>>>> trying to get groups for user biapp: id: biapp: No such user
>>>>
>>>> 15/10/29 10:34:46 WARN UserGroupInformation: No groups available for
>>>> user biapp
>>>> java.lang.RuntimeException:
>>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>>> user=biapp, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
>>>> at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>>>>
>>>> at
>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>> at
>>>> org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160)
>>>> at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167)
>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>>> at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>> at
>>>> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
>>>> at $iwC$$iwC.<init>(<console>:9)
>>>> at $iwC.<init>(<console>:18)
>>>> at <init>(<console>:20)
>>>> at .<init>(<console>:24)
>>>> at .<clinit>(<console>)
>>>> at .<init>(<console>:7)
>>>> at .<clinit>(<console>)
>>>> at $print(<console>)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>> at
>>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>>> at
>>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>>>> at
>>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>>> at
>>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>>> at
>>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>>> at
>>>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132)
>>>> at
>>>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
>>>> at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
>>>> at
>>>> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
>>>> at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
>>>> at
>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
>>>> at
>>>> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
>>>> at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
>>>> at
>>>> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
>>>> at
>>>> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
>>>> at
>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
>>>> at
>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>>> at
>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>>> at
>>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>>> at org.apache.spark.repl.SparkILoop.org
>>>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>>> at org.apache.spark.repl.Main$.main(Main.scala:31)
>>>> at org.apache.spark.repl.Main.main(Main.scala)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>> at
>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>>>> at
>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>> Caused by: org.apache.hadoop.security.AccessControlException:
>>>> Permission denied: user=biapp, access=WRITE,
>>>> inode="/user":hdfs:supergroup:drwxr-xr-x
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
>>>> at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>>>>
>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>>> at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>> at
>>>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>>>> at
>>>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>>>> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2689)
>>>> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2658)
>>>> at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:831)
>>>> at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)
>>>> at
>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>> at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:827)
>>>> at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:820)
>>>> at
>>>> org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission(Utilities.java:3679)
>>>> at
>>>> org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:597)
>>>> at
>>>> org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
>>>> at
>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
>>>> ... 56 more
>>>> Caused by:
>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>>>> Permission denied: user=biapp, access=WRITE,
>>>> inode="/user":hdfs:supergroup:drwxr-xr-x
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
>>>> at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:531)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>> at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>>>> at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
>>>> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2687)
>>>> ... 66 more
>>>>
>>>> <console>:10: error: not found: value sqlContext
>>>>        import sqlContext.implicits._
>>>>               ^
>>>> <console>:10: error: not found: value sqlContext
>>>>        import sqlContext.sql
>>>>               ^
>>>>
>>>> scala> sqlContext.sql("show databases").collect
>>>> <console>:14: error: not found: value sqlContext
>>>>               sqlContext.sql("show databases").collect
>>>>               ^
>>>>
>>>> scala>
>>>>
>>>> On Thu, Oct 29, 2015 at 10:26 AM, Deenar Toraskar <
>>>> deenar.toras...@gmail.com> wrote:
>>>>
>>>>> I dont know a lot about how pyspark works. Can you possibly try
>>>>> running spark-shell and do the same?
>>>>>
>>>>> sqlContext.sql("show databases").collect
>>>>>
>>>>> Deenar
>>>>>
>>>>> On 29 October 2015 at 14:18, Zoltan Fedor <zoltan.0.fe...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, I am. It was compiled with the following:
>>>>>>
>>>>>> export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3
>>>>>> export SPARK_YARN=true
>>>>>> export SPARK_HIVE=true
>>>>>> export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M
>>>>>> -XX:ReservedCodeCacheSize=512m"
>>>>>> mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive
>>>>>> -Phive-thriftserver -DskipTests clean package
>>>>>>
>>>>>> On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar <
>>>>>> deenar.toras...@gmail.com> wrote:
>>>>>>
>>>>>>> Are you using Spark built with hive ?
>>>>>>>
>>>>>>> # Apache Hadoop 2.4.X with Hive 13 support
>>>>>>> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive 
>>>>>>> -Phive-thriftserver -DskipTests clean package
>>>>>>>
>>>>>>>
>>>>>>> On 29 October 2015 at 13:08, Zoltan Fedor <zoltan.0.fe...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Deenar,
>>>>>>>> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR
>>>>>>>> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR 
>>>>>>>> ($SPARK_HOME/conf/yarn-conf) and
>>>>>>>> use the below to start pyspark, but the error is the exact same as 
>>>>>>>> before.
>>>>>>>>
>>>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>>>>>>>> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp 
>>>>>>>> MASTER=yarn
>>>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client
>>>>>>>>
>>>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
>>>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
>>>>>>>> Type "help", "copyright", "credits" or "license" for more
>>>>>>>> information.
>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>>> SLF4J: Found binding in
>>>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>> SLF4J: Found binding in
>>>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
>>>>>>>> an explanation.
>>>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>>> 15/10/29 09:06:36 WARN MetricsSystem: Using default name
>>>>>>>> DAGScheduler for source because spark.app.id is not set.
>>>>>>>> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load
>>>>>>>> native-hadoop library for your platform... using builtin-java classes 
>>>>>>>> where
>>>>>>>> applicable
>>>>>>>> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name
>>>>>>>> hive.metastore.local does not exist
>>>>>>>> Welcome to
>>>>>>>>       ____              __
>>>>>>>>      / __/__  ___ _____/ /__
>>>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
>>>>>>>>       /_/
>>>>>>>>
>>>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
>>>>>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>>>>>> >>> sqlContext2 = HiveContext(sc)
>>>>>>>> >>> sqlContext2 = HiveContext(sc)
>>>>>>>> >>> sqlContext2.sql("show databases").first()
>>>>>>>> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name
>>>>>>>> hive.metastore.local does not exist
>>>>>>>> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got exception
>>>>>>>> trying to get groups for user biapp: id: biapp: No such user
>>>>>>>>
>>>>>>>> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available
>>>>>>>> for user biapp
>>>>>>>> Traceback (most recent call last):
>>>>>>>>   File "<stdin>", line 1, in <module>
>>>>>>>>   File
>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>>>>>> line 552, in sql
>>>>>>>>     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>>>>>>>>   File
>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>>>>>> line 660, in _ssql_ctx
>>>>>>>>     "build/sbt assembly", e)
>>>>>>>> Exception: ("You must build Spark with Hive. Export
>>>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error
>>>>>>>> occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n',
>>>>>>>> JavaObject id=o20))
>>>>>>>> >>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar <
>>>>>>>> deenar.toras...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> *Hi Zoltan*
>>>>>>>>>
>>>>>>>>> Add hive-site.xml to your YARN_CONF_DIR. i.e.
>>>>>>>>> $SPARK_HOME/conf/yarn-conf
>>>>>>>>>
>>>>>>>>> Deenar
>>>>>>>>>
>>>>>>>>> *Think Reactive Ltd*
>>>>>>>>> deenar.toras...@thinkreactive.co.uk
>>>>>>>>> 07714140812
>>>>>>>>>
>>>>>>>>> On 28 October 2015 at 14:28, Zoltan Fedor <
>>>>>>>>> zoltan.0.fe...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1
>>>>>>>>>> on it in yarn client mode with Hive.
>>>>>>>>>>
>>>>>>>>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I
>>>>>>>>>> am not able to make SparkSQL to pick up the hive-site.xml when runnig
>>>>>>>>>> pyspark.
>>>>>>>>>>
>>>>>>>>>> hive-site.xml is located in $SPARK_HOME/hadoop-conf/hive-site.xml
>>>>>>>>>> and also in $SPARK_HOME/conf/hive-site.xml
>>>>>>>>>>
>>>>>>>>>> When I start pyspark with the below command and then run some
>>>>>>>>>> simple SparkSQL it fails, it seems it didn't pic up the settings in
>>>>>>>>>> hive-site.xml
>>>>>>>>>>
>>>>>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf
>>>>>>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp 
>>>>>>>>>> MASTER=yarn
>>>>>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client
>>>>>>>>>>
>>>>>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
>>>>>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
>>>>>>>>>> Type "help", "copyright", "credits" or "license" for more
>>>>>>>>>> information.
>>>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>>>>> SLF4J: Found binding in
>>>>>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>>>> SLF4J: Found binding in
>>>>>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
>>>>>>>>>> an explanation.
>>>>>>>>>> SLF4J: Actual binding is of type
>>>>>>>>>> [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>>>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name
>>>>>>>>>> DAGScheduler for source because spark.app.id is not set.
>>>>>>>>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load
>>>>>>>>>> native-hadoop library for your platform... using builtin-java 
>>>>>>>>>> classes where
>>>>>>>>>> applicable
>>>>>>>>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name
>>>>>>>>>> hive.metastore.local does not exist
>>>>>>>>>> Welcome to
>>>>>>>>>>       ____              __
>>>>>>>>>>      / __/__  ___ _____/ /__
>>>>>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.5.1
>>>>>>>>>>       /_/
>>>>>>>>>>
>>>>>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40)
>>>>>>>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>>>>>>>> >>> sqlContext2 = HiveContext(sc)
>>>>>>>>>> >>> sqlContext2.sql("show databases").first()
>>>>>>>>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name
>>>>>>>>>> hive.metastore.local does not exist
>>>>>>>>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got exception
>>>>>>>>>> trying to get groups for user biapp: id: biapp: No such user
>>>>>>>>>>
>>>>>>>>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups available
>>>>>>>>>> for user biapp
>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>   File "<stdin>", line 1, in <module>
>>>>>>>>>>   File
>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>>>>>>>> line 552, in sql
>>>>>>>>>>     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>>>>>>>>>>   File
>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py",
>>>>>>>>>> line 660, in _ssql_ctx
>>>>>>>>>>     "build/sbt assembly", e)
>>>>>>>>>> Exception: ("You must build Spark with Hive. Export
>>>>>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An 
>>>>>>>>>> error
>>>>>>>>>> occurred while calling 
>>>>>>>>>> None.org.apache.spark.sql.hive.HiveContext.\n',
>>>>>>>>>> JavaObject id=o20))
>>>>>>>>>> >>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> See in the above the warning about "WARN HiveConf: HiveConf of
>>>>>>>>>> name hive.metastore.local does not exist" while actually there is a
>>>>>>>>>> hive.metastore.local attribute in the hive-site.xml
>>>>>>>>>>
>>>>>>>>>> Any idea how to submit hive-site.xml in yarn client mode?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to